Category Archives: ISA API

ISA tooling developed for the metabolomics community

A new set of ISA software tools have been developed out of the EU H2020 PhenoMeNal: Large-Scale Computing for Medical Metabolomics project, which we introduced in this earlier blog post.

The 2018-02 release of PhenoMeNal, also known as “Cerebellin”, was released end of February 2018. It represents a major upgrade to the 2017-08 production release. It has a richer set of tools, depends on improved deployment software, includes improved workflows for Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) data, and strengthens massively the resilience infrastructure deployments under high load. PhenoMeNal comprises of cloud-based portal infrastructure that includes the Galaxy workflow system customised to run on the Kubernetes container orchestrator, with Galaxy tools running their processing in Docker containers in the cloud. PhenoMeNal, and thus our new ISA Galaxy tools, work in Galaxy running various cloud-computing infrastructures including Amazon Web Services, Google Cloud PlatformMicrosoft Azure, OpenStack and KVM.

The ISA team has been contributing to the project since 2015, and has been collaborating on the development of user-facing, cloud-based data management and processing infrastructure in the project. The Cerebellin release of the PhenoMeNal software includes a new set of ISA-related Galaxy workflow tools, as well as native support for the ISA-Tab format in Galaxy. The tools work with the MetaboLights database as well as with ISA-Tab studies uploaded directly into the Galaxy platform, and builds on the Python ISA-API.

The MetaboLights/ISA-Tab Factors Visualization tool in Galaxy.

The MetaboLights/ISA-Tab Factors Visualization tool in Galaxy.

The new ISA Galaxy tools include:

  • Metabolights downloader (W4M), developed with our colleagues at CEA, downloads MetaboLights studies in the new Galaxy “isa-tab” data type.
  • Study Metadata Exploration tools (5 different tools) that allows querying over ISA-Tab data based on study factor slicing.
  • MetaboLights Factors Viz – a tool developed with our colleagues at EMBL-EBI for visualizing a summary of study factors as a parallel sets plot.
  • Format conversions from ISA-Tab (using the “isa-tab” Galaxy data type) to ISA-JSON and to W4M (developed by CEA).
  • ISA-Tab validation, again using the “isa-tab” Galaxy data type.
  • mzml2isa and nmrml2isa – Automated study metadata creation in ISA-Tab using the “isa-tab” Galaxy data type, from mlML and nmrML data, developed with our colleagues at the University of Birmingham.
  • And finally, an interactive tool to create prospective ISA-Tab study templates as “isa-tab” Galaxy data types, based on study design information. This tool supports generating assays for both MS and NMR, using standardised file naming templates compatible with Phenome Centre Birmingham and the MRC-NIHR National Phenome Centre at Imperial College London. The tool shares curation practices with those used by the MetaboLights database and implements the Metabolomics Standards Initiative (MSI) reporting guidelines that go towards making metadata and data FAIR.

We are also developing extensions to our Galaxy tools to support NGS and DNA microarray data, and to enable direct deposition to public repositories, such as those hosted by EMBL-EBI, via Galaxy workflows.

You can try out our ISA Galaxy tools in the Cerebellin release of PhenoMeNal in the public PhenoMeNal Galaxy server. The next scheduled release of PhenoMeNal will be the Dalcotidine release scheduled for August 2018.

ISA Python API version 0.5 milestone

We’re very pleased to announce today the version 0.5 release of the Python ISA API, where the work started almost 2 years ago.

The ISA API aims to provide software developers with a set of tools to help you easily and quickly build your own ISA objects, validate, and convert between serializations of ISA-formatted datasets and other formats/schemas (e.g. SRA schemas). The ISA API is published on PyPI as the isatools package. The vision for the ISA API is to provide a programming library that will become the core for all software tooling that supports the ISA framework. It enables the import of various data formats into an implementation of the ISA Abstract Model as Python objects, and export of ISA content from Python objects back to different serialization formats.

ISA API diagramCurrently we support import of ISA-Tab, ISA JSON, SRA XML (European Nucleotide Archive), Metabolomics Workbench, Biocrates XML and mzML formats, and export to ISA-Tab, ISA JSON and SRA XML. Beyond enabling I/O of data, the ISA API also supports programmatic creation of ISA content through the Python ISA model objects directly, thus then being able to export ISA content in the aforementioned serialization formats. This means that you can use the ISA API in your own software tools to create ISA-Tab and ISA JSON. You can see the ISA API in action in this example creating a simple ISA-Tab.

Since the ISA API is available as a Python library in the isatools PyPI package (just install with pip install isatools), it can easily be integrated with Python ecosystem infrastructure such as iPython’s interactive computing environment and Jupyter, a web application that allows you to create and share documents that contain live Python code are more. We are also developing ISA API containers using Docker, via the Horizon 2020 PhenoMeNal project, to run various function from the isatools package on the Cloud.

This version 0.5 release marks a significant milestone as the ISA Team has put a lot of effort into developing various I/O and ISA content creation features. Now we are looking to scale up and make robust the ISA API with thorough performance and user testing as we work towards a version 1.0 release.

The ISA API is still in development and as an open-source project we would be very happy to receive any help and code contributions (testing, feature requests, pull requests). Please feel free to contact our development team at isatools@googlegroups.com or on the ISA Community Forum Google Group, or ask a question, report a bug or request a new feature in the GitHub issue tracker.

Read more:

Cloudy with a chance of MTBLS

Last week, the Horizon 2020 PhenoMeNal project held a training workshop on e-infrastructures for metabolomics. It was a hands-on developers workshop looking at different cloud technologies and how they might be deployed and utilised for dealing with computational workflows in metabolomics, as well as for data management. The ISA team at Oxford is a partner on the project.

The focus of the workshop was on kick-starting the development of the analytics infrastructure for PhenoMeNal. It is envisaged that a Europe-wide cloud infrastructure will be deployed, which might be a mix of public and private clouds (private secure clouds for dealing with patient-identifiable data), with the compute elements taking the form of microservice containers. In this workshop, we learned about how we can create microservices containerised with Docker, and use them on the Google Cloud infrastructure orchestrated by the MANTL framework.

PhenoMeNal e-infrastructures workshop

Hacking at the PhenoMeNal e-infrastructures workshop at SciLifeLab Uppsala, Sweden.

Now, you are probably wondering right now, “What does this have to do with ISA?”

During one of the hacking sessions in the workshop I worked with Ken Haug from the Metabolights (a popular metabolomics database that stores ISA tab files natively) team at EMBL-EBI, on working out a simple use case for our recent earlybird release of the Python ISA API. Here’s what we came up with.

First, we wrap up two file converters from the ISA API as Docker images (you can find these on Github):

  1. isatab2json – to convert ISA tab files to our ISA JSON format
  2. json2isatab – to convert JSON back to ISA tab.

Next, we create an iPython notebook using Jupyter that makes REST calls to our MANTL cluster running in Google Cloud. These REST calls simply ask MANTL to run each of the aforementioned Docker images as microservices. From our notebook, we can now:

  1. Convert an ISA tab to ISA JSON (which happens in a short-lived microservice in the cloud)
  2. Modify the ISA JSON (in a Jupyter notebook running in the cloud)
  3. Convert the modified ISA JSON back to ISA tab (again, in a microservice).
Diagram showing how an iPython notebook and ISA API microservices can be deployed

Within Google Cloud, a Jupyter notebook is deployed in its own microservice, where we can then call ISA API microservices that are created in their own containers on demand.

This effectively gives us the framework for a web-based ISA tab editor where ISA content can be modified by editing the JSON representation, something that the Metabolights team could use in the near future. Eventually, this may even lead to a web-based ISA creator.

You can check out the ISA microservices iPython notebook we created, but there’s quite a lot of overhead to set up the dependencies for the cloud infrastructure first. The intention here was to demonstrate how we can deploy ISA API services in a cloud, which is something that we plan to do with the PhenoMeNal project. However you don’t need to run the converters in the cloud and you can check out this standalone ISA API iPython notebook that you can run the same use case in a local Jupyter instance.

Please have a go yourself, and do give us any feedback via our ISA iPython notebooks project issue tracker. We hope to create more notebooks as demonstrators for how to use the new ISA API, so we would love for you to contribute any ideas and use cases.