Category Archives: ISA

ISA tooling developed for the metabolomics community

A new set of ISA software tools have been developed out of the EU H2020 PhenoMeNal: Large-Scale Computing for Medical Metabolomics project, which we introduced in this earlier blog post.

The 2018-02 release of PhenoMeNal, also known as “Cerebellin”, was released end of February 2018. It represents a major upgrade to the 2017-08 production release. It has a richer set of tools, depends on improved deployment software, includes improved workflows for Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) data, and strengthens massively the resilience infrastructure deployments under high load. PhenoMeNal comprises of cloud-based portal infrastructure that includes the Galaxy workflow system customised to run on the Kubernetes container orchestrator, with Galaxy tools running their processing in Docker containers in the cloud. PhenoMeNal, and thus our new ISA Galaxy tools, work in Galaxy running various cloud-computing infrastructures including Amazon Web Services, Google Cloud Platform, Microsoft Azure, OpenStack and KVM.

The ISA team has been contributing to the project since 2015, and has been collaborating on the development of user-facing, cloud-based data management and processing infrastructure in the project. The Cerebellin release of the PhenoMeNal software includes a new set of ISA-related Galaxy workflow tools, as well as native support for the ISA-Tab format in Galaxy. The tools work with the MetaboLights database as well as with ISA-Tab studies uploaded directly into the Galaxy platform, and builds on the Python ISA-API.

The MetaboLights/ISA-Tab Factors Visualization tool in Galaxy.

The new ISA Galaxy tools include:

Metabolights downloader (W4M), developed with our colleagues at CEA, downloads MetaboLights studies in the new Galaxy “isa-tab” data type.
Study Metadata Exploration tools (5 different tools) that allows querying over ISA-Tab data based on study factor slicing.
MetaboLights Factors Viz – a tool developed with our colleagues at EMBL-EBI for visualizing a summary of study factors as a parallel sets plot.
Format conversions from ISA-Tab (using the “isa-tab” Galaxy data type) to ISA-JSON and to W4M (developed by CEA).
ISA-Tab validation, again using the “isa-tab” Galaxy data type.
mzml2isa and nmrml2isa – Automated study metadata creation in ISA-Tab using the “isa-tab” Galaxy data type, from mlML and nmrML data, developed with our colleagues at the University of Birmingham.
And finally, an interactive tool to create prospective ISA-Tab study templates as “isa-tab” Galaxy data types, based on study design information. This tool supports generating assays for both MS and NMR, using standardised file naming templates compatible with Phenome Centre Birmingham and the MRC-NIHR National Phenome Centre at Imperial College London. The tool shares curation practices with those used by the MetaboLights database and implements the Metabolomics Standards Initiative (MSI) reporting guidelines that go towards making metadata and data FAIR.

We are also developing extensions to our Galaxy tools to support NGS and DNA microarray data, and to enable direct deposition to public repositories, such as those hosted by EMBL-EBI, via Galaxy workflows.

You can try out our ISA Galaxy tools in the Cerebellin release of PhenoMeNal in the public PhenoMeNal Galaxy server. The next scheduled release of PhenoMeNal will be the Dalcotidine release scheduled for August 2018.

ISAcreator 1.7.11 now available

Today we announce the release of ISAcreator 1.7.11.

This release updates ISAcreator to work with Java 9.

You can download ISAcreator 1.7.11 from Github here: https://github.com/ISA-tools/ISAcreator/releases/tag/v1.7.11

If you’re an ISAcreator user, or use any of the other ISA-tools suite, please let us know and we can list you as being part of the ISAcommons community.

If you have any questions or any problems with using ISAcreator, please drop the ISA Team an email to isatools@googlegroups.com or to the ISA community forum.

ISAcreator 1.7.10 available now

Today we announce the release of ISAcreator 1.7.10.

Included in this release are several bug fixes primarily relating to Windows 10 file system issues that have been reported by the user community.

Above adapted from original photo by Robert Lee under Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0).

You can download ISAcreator 1.7.10 from Github here: https://github.com/ISA-tools/ISAcreator/releases/tag/1.7.10

If you’re an ISAcreator user, or use any of the other ISA-tools suite, please let us know and we can list you as being part of the ISAcommons community.

If you have any questions or any problems with using ISAcreator, please drop the ISA Team an email to isatools@googlegroups.com or to the ISA community forum.

ISAcreator 1.7.9 available now

Today marks the release of ISAcreator 1.7.9. It’s been a long time coming having been more than 2 years since the last update to the ISA framework’s flagship ISA-Tab creator and editor software.

Included in this release, we have addressed a range of bugs with fixes covering UI issues, file path handling errors, and cross-platform issues with bug reports from our Windows users. With this release, we want to demonstrate to our valued user community that development and maintenance of ISAcreator is still active and continues to be the de-facto editor for the ISA-Tab format.

You can download ISAcreator 1.7.9 from Github here:
https://github.com/ISA-tools/ISAcreator/releases/tag/1.7.9

ISA Python API version 0.5 milestone

We’re very pleased to announce today the version 0.5 release of the Python ISA API, where the work started almost 2 years ago.

The ISA API aims to provide software developers with a set of tools to help you easily and quickly build your own ISA objects, validate, and convert between serializations of ISA-formatted datasets and other formats/schemas (e.g. SRA schemas). The ISA API is published on PyPI as the isatools package. The vision for the ISA API is to provide a programming library that will become the core for all software tooling that supports the ISA framework. It enables the import of various data formats into an implementation of the ISA Abstract Model as Python objects, and export of ISA content from Python objects back to different serialization formats.

Currently we support import of ISA-Tab, ISA JSON, SRA XML (European Nucleotide Archive), Metabolomics Workbench, Biocrates XML and mzML formats, and export to ISA-Tab, ISA JSON and SRA XML. Beyond enabling I/O of data, the ISA API also supports programmatic creation of ISA content through the Python ISA model objects directly, thus then being able to export ISA content in the aforementioned serialization formats. This means that you can use the ISA API in your own software tools to create ISA-Tab and ISA JSON. You can see the ISA API in action in this example creating a simple ISA-Tab.

Since the ISA API is available as a Python library in the isatools PyPI package (just install with pip install isatools), it can easily be integrated with Python ecosystem infrastructure such as iPython’s interactive computing environment and Jupyter, a web application that allows you to create and share documents that contain live Python code are more. We are also developing ISA API containers using Docker, via the Horizon 2020 PhenoMeNal project, to run various function from the isatools package on the Cloud.

This version 0.5 release marks a significant milestone as the ISA Team has put a lot of effort into developing various I/O and ISA content creation features. Now we are looking to scale up and make robust the ISA API with thorough performance and user testing as we work towards a version 1.0 release.

The ISA API is still in development and as an open-source project we would be very happy to receive any help and code contributions (testing, feature requests, pull requests). Please feel free to contact our development team at isatools@googlegroups.com or on the ISA Community Forum Google Group, or ask a question, report a bug or request a new feature in the GitHub issue tracker.

Documentation: http://isatools.readthedocs.io
GitHub project: https://github.com/ISA-tools/isa-api
GitHub issues: https://github.com/ISA-tools/isa-api/issues
Python Package Index: https://pypi.python.org/pypi/isatools
All about ISA: http://isa-tools.org

New ISA Model and Serialization Specifications released!

Over the last few months, the ISA Team has been working hard on editing a long-awaited release of the ISA Model and Serialization Specifications 1.0.

The original ISA-Tab specification was published as a Release Candidate document in 2008, documenting the initial work that forms the ISA framework, with a further update in 2009. Since then, we have done work on a new serialization in JSON, ISA-JSON, and abstracted out the data model from both the tabular and JSON formats.

The ISA Model and Serialization Specifications consist of three specification documents:

ISA Abstract Model – a data model of ISA objects/entities and their relation to one another
ISA-Tab format – the tabular serialization format of the Abstract Model
ISA-JSON format – the JSON serialization format of the Abstract Model.

The specifications are licensed under CC BY-SA 4.0 , and you can cite the specifications with:

Sansone, Susanna-Assunta, Rocca-Serra, Philippe, Gonzalez-Beltran, Alejandra, Johnson David & the ISA Community. (2016, October 28). ISA Model and Serialization Specifications 1.0. Zenodo. http://doi.org/10.5281/zenodo.163640.

To view the latest version online, please visit http://isa-specs.readthedocs.io/en/latest/

Plant Science takes a focus on ISA

Back in April this year, Dr David Johnson from the ISA team gave a presentation on “Data Infrastructures to Foster Data Reuse” at a workshop on Integrating Large Data into Plant Science: From Big Data to Discovery hosted by GARnet (the UK network for Arabidopsis researchers) and Egenis (the Exeter Centre for the Study of the Life Sciences). The workshop was held at Dartington Hall in Devon, South West England, and was well attended by researchers from the plant and biological science community worldwide as well as representatives from industry from organisations such as Syngenta.

David presented on ISA, as well as on biosharing.org, as candidate data infrastructure resources for enabling data reuse in the plant sciences, as well as presenting an example of how one might encode high-throughput plant phenotyping in ISA tab.

We have observed the uptake of the ISA tab format across the broad range of life sciences, but view its adoption, with a view to making data FAIR (Findable, Accessible, Interoperable and Reusable), in the plant sciences as essential for the field. In particular centres such as the UK’s National Plant Phenomics Centre in Aberystwyth, Wales, could benefit hugely from adopting ISA where there are emerging challenges in data management, in particular as automation of data collection is a significant driver in modern plant-based research and agritech.

There are also existing data analysis platforms such as Araport (the Arabidopsis information Portal), TAIR (The Arabidopsis Information Resources) and BioDare (Biological Data Repository) that could benefit from standardizing their experimental data, as well as ongoing efforts to create open data resources in the plant sciences, such as the Collaborative Open Plant Omics (COPO) project, that will be using the new ISA JSON format as native data objects.

You can check out David’s presentation on SlideShare.

GARNet workshop on Integrating Large Data into Plant Science from David Johnson

Join the funFAIR!

Today, March 15 2016, the FAIR Guiding Principles for scientific data management and stewardship were formally published in the Nature Publishing Group journal Scientific Data. The problem the FAIR Principles address is the lack of widely shared, clearly articulated, and broadly applicable best practices around the publication of scientific data. While the history of scholarly publication in journals is long and well established, the same cannot be said of formal data publication. Yet, data could be considered the primary output of scientific research, and its publication and reuse is necessary to ensure validity, reproducibility, and to drive further discoveries. The FAIR Principles address these needs by providing a precise and measurable set of qualities a good data publication should exhibit – qualities that ensure that the data is Findable, Accessible, Interoperable, and Reusable (FAIR).

The ISA infrastructure project and BioSharing registry of standards, databases and policies are both part of this community in which we strive to make data FAIR. Do join us in these efforts!

For more information, read the paper and see the press release at the Dutch Tech Centre for Life Sciences.

The ISA team is growing!

We are very happy to announced that Dr David Johnson and Dr Massimiliano Izzo have joined the ISA team as a Research Software Engineers, last year and this year, respectively.

David and Massi are both great additions to the team. A few words about their past experience…

David

David completed his PhD at the University of Reading (UK) and before joining us at the University of Oxford e-Research Centre (OeRC), he worked at Imperial College London where he was a founding member of the Data Science Institute. Prior to that he worked in the Department of Computer Science at Oxford University, where he was part of an FP7 project developing interoperable cancer model databases, and also in the Evolutionary Biology Group at the University of Reading where he developed high-performance computing software for phylogenetics. He serves on the technical programme committees of a number of international conferences including the International Conference on Computational Science series and on the editorial board of the journal Cancer Informatics.

Massi

Massi completed PhD studies in Biomedical Engineering at the University of Genoa (Italy). His main interests are in the design and development of innovative data models for Life Sciences, structured/unstructured data management and full-stack software development (JavaScript all the way!). Before joining the OeRC, he was a Research Collaborator at the Giannini Gaslini Institute, in Genoa (Italy) where he developed distributed data management systems for Integrated Biobanking Management, mostly targeted to Paediatric Tumours. In his free time, Massi enjoys reading (mostly speculative fiction novels), gazing at the ceiling while lying on the sofa, and wander aimlessly in bookshops and cafes.

You can follow all their code contributions to ISA-tools through their Github profiles: djcomlab and zigur.

Investigation/Study/Assay, Hacks/Coffee/Cakes

In this post, Dr David Johnson gives his reflection on an ISA specification hackathon held in July 2015, in advance of joining the ISA team at Oxford as a research software engineer.

Last week I joined my prospective colleagues at the Oxford University e-Research Centre (OeRC) with some of their collaborators to thresh out an evolution of the ISA (Investigation/Study/Assay) metadata tracking framework. I will be joining the ISA development team at Oxford from September this year, which is a new phase in my career that I am very much looking forward to.

ISA consists of a model specification that describes its key concept elements and structure, while implementations of the specification are also developed by the ISA team. The framework aims to facilitate standards-compliant collection, curation, management and reuse of datasets in the life sciences. The first version of the specification, a Release Candidate from 2008, is implemented as the ISA-Tab (tabular) format – a table-based format that many working in the life sciences are used to, where data is abundantly stored and manipulated in spreadsheets. More recently ISA can also be converted to RDF via linkedISA.

Serious ISA 2.0 core vs. extension spec hacking. Powered by cake. #isamodel #isa_hack @isatools pic.twitter.com/W9c6ASilOA

— Robert Davey (@froggleston) July 21, 2015

While I have yet to officially join the ISA team (I am currently on a short sabbatical since leaving a research post at Imperial College London) I was invited to attend a 3-day workshop in Oxford to review and make new amendments to the ISA specification towards a version 2.0 release. The workshop, an ELIXIR UK event, was billed as the “ISA as a FAIR research object” Hack-the-Spec event. We were joined by representatives from The Genome Analysis Centre, the European Bioinformatics Institute, Leiden, Manchester and Birmingham universities and even a group visiting from my home-town of Hong Kong, from the GigaScience journal that was launched by Beijing Genomics Institute in 2012. We also were joined online by a number of researchers dialling in via Google hangouts from various sites in Europe.

As a workshop report will come out in due course I won’t get into the detail of the outcomes, but broadly the discussions focused around:

Evolving ISA to enable FAIR (Findable, Accessible, Interoperable, Reusable) research objects
Fixing ambiguities, missing structures and elements in ISA 1.0
Enabling integration of standard identification schemes such as ORCID
Redefining the spec to define the ‘core’ ISA elements and separating out domain specific ‘extensions’
Specifying conventions, mechanisms, and best practices for developing extensions to this new ‘ISA core’.

What was clear was that there was plenty of scope for evolving ISA from various parts of the user community. By abstracting out the core ISA specification, what we need now is contributions from a diverse range of exemplar projects to ensure that the core is truly interoperable. To this end, we are now encouraging communities to share their ISA templates along with exemplar experiments and start building a repository of extensions in the ISA commons website. In the meantime the ISA team will be formalising the ISA core and developing new reference implementations in tabular and JSON formats and supporting tools. We hope to have a draft specification presented to the community in the fall of 2015.

Apart from the 3 days of discussions fuelled by much coffee and cake, we did also find some time in the evenings to get out to enjoy the sunshine and enjoy a couple of Oxford’s wonderful restaurants…

A well deserved hackathon dinner at The Folly #Oxford #ElixirNodeUK #isa_hack pic.twitter.com/9zUcoqWwE2

— Dr David Johnson (@NuDataScientist) July 21, 2015

One of my key takeaways from the workshop, apart from having a crash course into the ISA spec that I will be working on in the coming months, is the importance of going through the community engagement process when developing a data specification. As with engineering software, we need to make sure we are building the right thing. Soliciting feedback is not a vanity exercise or even a political exercise, but an essential part of a carefully-managed process to ensure we evolve the specification to fulfil the changing needs of the people that matter – the user community.

Find out more about ISA here at www.isa-tools.org and about the wider community’s efforts at isacommons.org

Catch up tweets about this ISA 2.0 specification workshop by searching the hashtags #isa_hack and#ElixirNodeUK

Related Links

Resources