Category Archives: workshops

Plant Science takes a focus on ISA

Back in April this year, Dr David Johnson from the ISA team gave a presentation on “Data Infrastructures to Foster Data Reuse” at a workshop on Integrating Large Data into Plant Science: From Big Data to Discovery hosted by GARnet (the UK network for Arabidopsis researchers) and Egenis (the Exeter Centre for the Study of the Life Sciences). The workshop was held at Dartington Hall in Devon, South West England, and was well attended by researchers from the plant and biological science community worldwide as well as representatives from industry from organisations such as Syngenta.

David presented on ISA, as well as on biosharing.org, as candidate data infrastructure resources for enabling data reuse in the plant sciences, as well as presenting an example of how one might encode high-throughput plant phenotyping in ISA tab.

We have observed the uptake of the ISA tab format across the broad range of life sciences, but view its adoption, with a view to making data FAIR (Findable, Accessible, Interoperable and Reusable), in the plant sciences as essential for the field. In particular centres such as the UK’s National Plant Phenomics Centre in Aberystwyth, Wales, could benefit hugely from adopting ISA where there are emerging challenges in data management, in particular as automation of data collection is a significant driver in modern plant-based research and agritech.

There are also existing data analysis platforms such as Araport (the Arabidopsis information Portal), TAIR (The Arabidopsis Information Resources) and BioDare (Biological Data Repository) that could benefit from standardizing their experimental data, as well as ongoing efforts to create open data resources in the plant sciences, such as the Collaborative Open Plant Omics (COPO) project, that will be using the new ISA JSON format as native data objects.

You can check out David’s presentation on SlideShare.

GARNet workshop on Integrating Large Data into Plant Science from David Johnson

Cloudy with a chance of MTBLS

Last week, the Horizon 2020 PhenoMeNal project held a training workshop on e-infrastructures for metabolomics. It was a hands-on developers workshop looking at different cloud technologies and how they might be deployed and utilised for dealing with computational workflows in metabolomics, as well as for data management. The ISA team at Oxford is a partner on the project.

The focus of the workshop was on kick-starting the development of the analytics infrastructure for PhenoMeNal. It is envisaged that a Europe-wide cloud infrastructure will be deployed, which might be a mix of public and private clouds (private secure clouds for dealing with patient-identifiable data), with the compute elements taking the form of microservice containers. In this workshop, we learned about how we can create microservices containerised with Docker, and use them on the Google Cloud infrastructure orchestrated by the MANTL framework.

Hacking at the PhenoMeNal e-infrastructures workshop at SciLifeLab Uppsala, Sweden.

Now, you are probably wondering right now, “What does this have to do with ISA?”

During one of the hacking sessions in the workshop I worked with Ken Haug from the Metabolights (a popular metabolomics database that stores ISA tab files natively) team at EMBL-EBI, on working out a simple use case for our recent earlybird release of the Python ISA API. Here’s what we came up with.

First, we wrap up two file converters from the ISA API as Docker images (you can find these on Github):

isatab2json – to convert ISA tab files to our ISA JSON format
json2isatab – to convert JSON back to ISA tab.

Next, we create an iPython notebook using Jupyter that makes REST calls to our MANTL cluster running in Google Cloud. These REST calls simply ask MANTL to run each of the aforementioned Docker images as microservices. From our notebook, we can now:

Convert an ISA tab to ISA JSON (which happens in a short-lived microservice in the cloud)
Modify the ISA JSON (in a Jupyter notebook running in the cloud)
Convert the modified ISA JSON back to ISA tab (again, in a microservice).

Diagram showing how an iPython notebook and ISA API microservices can be deployed

Within Google Cloud, a Jupyter notebook is deployed in its own microservice, where we can then call ISA API microservices that are created in their own containers on demand.

This effectively gives us the framework for a web-based ISA tab editor where ISA content can be modified by editing the JSON representation, something that the Metabolights team could use in the near future. Eventually, this may even lead to a web-based ISA creator.

You can check out the ISA microservices iPython notebook we created, but there’s quite a lot of overhead to set up the dependencies for the cloud infrastructure first. The intention here was to demonstrate how we can deploy ISA API services in a cloud, which is something that we plan to do with the PhenoMeNal project. However you don’t need to run the converters in the cloud and you can check out this standalone ISA API iPython notebook that you can run the same use case in a local Jupyter instance.

Please have a go yourself, and do give us any feedback via our ISA iPython notebooks project issue tracker. We hope to create more notebooks as demonstrators for how to use the new ISA API, so we would love for you to contribute any ideas and use cases.

Investigation/Study/Assay, Hacks/Coffee/Cakes

In this post, Dr David Johnson gives his reflection on an ISA specification hackathon held in July 2015, in advance of joining the ISA team at Oxford as a research software engineer.

Last week I joined my prospective colleagues at the Oxford University e-Research Centre (OeRC) with some of their collaborators to thresh out an evolution of the ISA (Investigation/Study/Assay) metadata tracking framework. I will be joining the ISA development team at Oxford from September this year, which is a new phase in my career that I am very much looking forward to.

ISA consists of a model specification that describes its key concept elements and structure, while implementations of the specification are also developed by the ISA team. The framework aims to facilitate standards-compliant collection, curation, management and reuse of datasets in the life sciences. The first version of the specification, a Release Candidate from 2008, is implemented as the ISA-Tab (tabular) format – a table-based format that many working in the life sciences are used to, where data is abundantly stored and manipulated in spreadsheets. More recently ISA can also be converted to RDF via linkedISA.

Serious ISA 2.0 core vs. extension spec hacking. Powered by cake. #isamodel #isa_hack @isatools pic.twitter.com/W9c6ASilOA

— Robert Davey (@froggleston) July 21, 2015

While I have yet to officially join the ISA team (I am currently on a short sabbatical since leaving a research post at Imperial College London) I was invited to attend a 3-day workshop in Oxford to review and make new amendments to the ISA specification towards a version 2.0 release. The workshop, an ELIXIR UK event, was billed as the “ISA as a FAIR research object” Hack-the-Spec event. We were joined by representatives from The Genome Analysis Centre, the European Bioinformatics Institute, Leiden, Manchester and Birmingham universities and even a group visiting from my home-town of Hong Kong, from the GigaScience journal that was launched by Beijing Genomics Institute in 2012. We also were joined online by a number of researchers dialling in via Google hangouts from various sites in Europe.

As a workshop report will come out in due course I won’t get into the detail of the outcomes, but broadly the discussions focused around:

Evolving ISA to enable FAIR (Findable, Accessible, Interoperable, Reusable) research objects
Fixing ambiguities, missing structures and elements in ISA 1.0
Enabling integration of standard identification schemes such as ORCID
Redefining the spec to define the ‘core’ ISA elements and separating out domain specific ‘extensions’
Specifying conventions, mechanisms, and best practices for developing extensions to this new ‘ISA core’.

What was clear was that there was plenty of scope for evolving ISA from various parts of the user community. By abstracting out the core ISA specification, what we need now is contributions from a diverse range of exemplar projects to ensure that the core is truly interoperable. To this end, we are now encouraging communities to share their ISA templates along with exemplar experiments and start building a repository of extensions in the ISA commons website. In the meantime the ISA team will be formalising the ISA core and developing new reference implementations in tabular and JSON formats and supporting tools. We hope to have a draft specification presented to the community in the fall of 2015.

Apart from the 3 days of discussions fuelled by much coffee and cake, we did also find some time in the evenings to get out to enjoy the sunshine and enjoy a couple of Oxford’s wonderful restaurants…

A well deserved hackathon dinner at The Folly #Oxford #ElixirNodeUK #isa_hack pic.twitter.com/9zUcoqWwE2

— Dr David Johnson (@NuDataScientist) July 21, 2015

One of my key takeaways from the workshop, apart from having a crash course into the ISA spec that I will be working on in the coming months, is the importance of going through the community engagement process when developing a data specification. As with engineering software, we need to make sure we are building the right thing. Soliciting feedback is not a vanity exercise or even a political exercise, but an essential part of a carefully-managed process to ensure we evolve the specification to fulfil the changing needs of the people that matter – the user community.

Find out more about ISA here at www.isa-tools.org and about the wider community’s efforts at isacommons.org

Catch up tweets about this ISA 2.0 specification workshop by searching the hashtags #isa_hack and#ElixirNodeUK

Related Links

Resources