Category Archives: Standards

New ISA Model and Serialization Specifications released!

Over the last few months, the ISA Team has been working hard on editing a long-awaited release of the ISA Model and Serialization Specifications 1.0.

The original ISA-Tab specification was published as a Release Candidate document in 2008, documenting the initial work that forms the ISA framework, with a further update in 2009. Since then, we have done work on a new serialization in JSON, ISA-JSON, and abstracted out the data model from both the tabular and JSON formats.

ISA Model and Specs diagram

The ISA Model and Serialization Specifications consist of three specification documents:

  1. ISA Abstract Model – a data model of ISA objects/entities and their relation to one another
  2. ISA-Tab format – the tabular serialization format of the Abstract Model
  3. ISA-JSON format – the JSON serialization format of the Abstract Model.

The specifications are licensed under CC BY-SA 4.0 , and you can cite the specifications with:

Sansone, Susanna-Assunta, Rocca-Serra, Philippe, Gonzalez-Beltran, Alejandra, Johnson David & the ISA Community. (2016, October 28). ISA Model and Serialization Specifications 1.0. Zenodo. http://doi.org/10.5281/zenodo.163640.

To view the latest version online, please visit http://isa-specs.readthedocs.io/en/latest/

Plant Science takes a focus on ISA

Back in April this year, Dr David Johnson from the ISA team gave a presentation on “Data Infrastructures to Foster Data Reuse” at a workshop on Integrating Large Data into Plant Science: From Big Data to Discovery hosted by GARnet (the UK network for Arabidopsis researchers) and Egenis (the Exeter Centre for the Study of the Life Sciences). The workshop was held at Dartington Hall in Devon, South West England, and was well attended by researchers from the plant and biological science community worldwide as well as representatives from industry from organisations such as Syngenta.

David presented on ISA, as well as on biosharing.org, as candidate data infrastructure resources for enabling data reuse in the plant sciences, as well as presenting an example of how one might encode high-throughput plant phenotyping in ISA tab.

We have observed the uptake of the ISA tab format across the broad range of life sciences, but view its adoption, with a view to making data FAIR (Findable, Accessible, Interoperable and Reusable), in the plant sciences as essential for the field. In particular centres such as the UK’s National Plant Phenomics Centre in Aberystwyth, Wales, could benefit hugely from adopting ISA where there are emerging challenges in data management, in particular as automation of data collection is a significant driver in modern plant-based research and agritech.

There are also existing data analysis platforms such as Araport (the Arabidopsis information Portal), TAIR (The Arabidopsis Information Resources) and BioDare (Biological Data Repository) that could benefit from standardizing their experimental data, as well as ongoing efforts to create open data resources in the plant sciences, such as the Collaborative Open Plant Omics (COPO) project, that will be using the new ISA JSON format as native data objects.

You can check out David’s presentation on SlideShare.

Investigation/Study/Assay, Hacks/Coffee/Cakes

In this post, Dr David Johnson gives his reflection on an ISA specification hackathon held in July 2015, in advance of joining the ISA team at Oxford as a research software engineer.

Last week I joined my prospective colleagues at the Oxford University e-Research Centre (OeRC) with some of their collaborators to thresh out an evolution of the ISA (Investigation/Study/Assay) metadata tracking framework. I will be joining the ISA development team at Oxford from September this year, which is a new phase in my career that I am very much looking forward to.

ISA consists of a model specification that describes its key concept elements and structure, while implementations of the specification are also developed by the ISA team. The framework aims to facilitate standards-compliant collection, curation, management and reuse of datasets in the life sciences. The first version of the specification, a Release Candidate from 2008, is implemented as the ISA-Tab (tabular) format – a table-based format that many working in the life sciences are used to, where data is abundantly stored and manipulated in spreadsheets. More recently ISA can also be converted to RDF via linkedISA.

While I have yet to officially join the ISA team (I am currently on a short sabbatical since leaving a research post at Imperial College London) I was invited to attend a 3-day workshop in Oxford to review and make new amendments to the ISA specification towards a version 2.0 release. The workshop, an ELIXIR UK event, was billed as the “ISA as a FAIR research object” Hack-the-Spec event. We were joined by representatives from The Genome Analysis Centre, the European Bioinformatics Institute, Leiden, Manchester and Birmingham universities and even a group visiting from my home-town of Hong Kong, from the GigaScience journal that was launched by Beijing Genomics Institute in 2012. We also were joined online by a number of researchers dialling in via Google hangouts from various sites in Europe.

As a workshop report will come out in due course I won’t get into the detail of the outcomes, but broadly the discussions focused around:

  • Evolving ISA to enable FAIR (Findable, Accessible, Interoperable, Reusable) research objects
  • Fixing ambiguities, missing structures and elements in ISA 1.0
  • Enabling integration of standard identification schemes such as ORCID
  • Redefining the spec to define the ‘core’ ISA elements and separating out domain specific ‘extensions’
  • Specifying conventions, mechanisms, and best practices for developing extensions to this new ‘ISA core’.

What was clear was that there was plenty of scope for evolving ISA from various parts of the user community. By abstracting out the core ISA specification, what we need now is contributions from a diverse range of exemplar projects to ensure that the core is truly interoperable. To this end, we are now encouraging communities to share their ISA templates along with exemplar experiments and start building a repository of extensions in the ISA commons website. In the meantime the ISA team will be formalising the ISA core and developing new reference implementations in tabular and JSON formats and supporting tools. We hope to have a draft specification presented to the community in the fall of 2015.

Apart from the 3 days of discussions fuelled by much coffee and cake, we did also find some time in the evenings to get out to enjoy the sunshine and enjoy a couple of Oxford’s wonderful restaurants…

One of my key takeaways from the workshop, apart from having a crash course into the ISA spec that I will be working on in the coming months, is the importance of going through the community engagement process when developing a data specification. As with engineering software, we need to make sure we are building the right thing. Soliciting feedback is not a vanity exercise or even a political exercise, but an essential part of a carefully-managed process to ensure we evolve the specification to fulfil the changing needs of the people that matter – the user community.

Find out more about ISA here at www.isa-tools.org and about the wider community’s efforts at isacommons.org

Catch up tweets about this ISA 2.0 specification workshop by searching the hashtags #isa_hack and#ElixirNodeUK

Introducing OntoMaton – Ontology Search & Tagging for Google Spreadsheets

We are happy to announce the release of OntoMaton, a tool which allows users to search for ontology terms and tag free text right in Google Spreadsheets. This post will serve to introduce you to the tool, how it works and how it can make it easier for users to use ontologies in a pervasive, powerful and collaborative environment, complementing existing work from our team in the creation of ISAcreator.

How it looks

OntoMaton is available from the Google Script Gallery and when installed provides a menu as shown below.

From the menu you may access two resources part of OntoMaton: ontology search and ontology tagging. There is also an ‘about’ option.

Ontology Search

Ontology Tagging

Behind the scenes: restricting the ontology search space

If a sheet named “restrictions” is in your spreadsheet, OntoMaton will consult it to determine if the currently selected column/row name has a narrowed ontology search space. This makes it quicker to search BioPortal, allows for restriction of the user’s result space to make easier the process of selecting a term.

Behind the scenes: extra information about the terms you select

For every term you select, it’s full details are recorded in a “terms” sheet. This makes it possible to use OntoMaton in any spreadsheet and all provenance information (including URIs, ontology source and version) for selected ontology terms will be immediately available for use when exposing your records to the linked data world!

Installing

To install, create a new google spreadsheet, then go to the menu tools > script gallery. In the script gallery, search for ontology or ontomaton and you’ll get the following result pane.

Click on ‘install’ and this will install the scripts inside your spreadsheet. Then there is one more and final step to follow for installation. You have to click again on tools > script manager and you’ll be presented with something like that shown in the image below.

OntoMaton contains lots of functions, but the only one you need to worry about in order to run the program is the onOpen function. Click this then click on run and the OntoMaton menu will be installed in your menu bar. From here you’ll be able to access the ontology search and ontology tagging functions.

Let us know what you think! New releases will come soon to fix any problems you may identify, please submit all ‘bugs’ and feature requests through https://github.com/ISA-tools/OntoMaton/issues

OntoMaton inherently supports ISA-Tab files too. So if you have an investigation file it will automatically add ontology sources to the ONTOLOGY SOURCE REFERENCE block. Also, if you have Term Source Ref and Term Source Accession after a column, OntoMaton will automatically populate these columns for you.

Also, the following table provides a quick review of available tools attempting to mix spreadsheets and access to vocabulary servers:

domain

automated

annotation

ontology search/lookup

versioning*

collaboration

RightField

general

ISA creator

multiomics

Proteome Harvest PRIDE

proteomics

Annotare

transcriptomics

OntoMaton

general

  • by versioning we refer to managing of user edits throughout the annotation process.

We hope you enjoy this new feature!

The ISA team

Addendum:

Safari 6 users, be aware you will have to activate the ‘developer menu’ from the Advanced Item in the Safari ‘Preferences’ menu item. Once activated, go to menu ‘Develop’ and navigate to ‘User Agent’ item and select ‘Safari 5.1.7’ for enabling the browser to work with Google Spreadsheet. (Thanks to rpyzh for reporting the issue, see here)

Toward interoperable bioscience data published in Nature Genetics

Our commentary, named “Toward interoperable bioscience data” has just been released in Nature Genetics. In it, we focus on working towards an ISA commons, where one day we’ll all be able to share our experimental metadata in one common, easy to use format, supporting minimal information checklists and ontologies (via the ISA-Tab format and respective tools).

The editor speaks about the paper, paraphrased here:

“Reformatting data is a full-time job for many researchers, even before the minimum reporting guidelines, terminologies and formats of each field are taken into consideration. In this issue, we present a Commentary and a Perspective suggesting solutions to these problems that have been developed by a process of community consultation and open review to which the journal was a party. In the Commentary, Susanna-Assunta Sansone and colleagues identify one central problem, namely that “most repositories are designed for specific assay types, necessitating the fragmentation of complex datasets,” and they offer a unified view of the metadata formatting that will be needed to ensure that biomedical research datasets become interoperable. This solution is the overarching ISA framework, where the acronym stands for ‘Investigation’ (the project context), ‘Study’ (a unit of research) and ‘Assay’ (analytical measurement) (p 121). This proposal shifts the sets of reporting standards agreed upon by each community into the infrastructure and formatting of the data files themselves. Sansone and colleagues also list a set of participant communities that can pioneer the approach and teach by example.”

Many thanks to all who contributed to the paper and to the growing success of ISA (commons and tools).

Browse & search OLS and BioPortal Seamlessly with ISAcreator and ISAcreator configurator

For browsing and querying OLS and BioPortal seamlessly, ISAcreator and it’s associated configurator have done much to make life easier for it’s users.

ISAcreator configurator and it’s use for ontologies

ISAcreator Configurator is a tool used by community experts or curators to detail the Minimum Information (via MIBBI for instance) that users should enter to describe their experiment. As part of this, curators can define which fields (e.g. Sample Name) are required, whether or not they require ontology terms and if so, which ontologies (and parts of the ontologies) users should be pointed to.

ISAcreator Configurator – allows curators to define ‘checklists’: these are the fields required to describe an experiment from start to finish.

The ISAcreator configurator provides both the ability to browse ontologies and to search them, so as to allow curators to select which ontologies and which parts of these ontologies should be suggested to users whenever they annotate a field in the ISAcreator.

Browsing ontologies

Users can browse ontologies in OLS and BioPortal via an easy to use GUI. The browsing is done on the fly by accessing hierarchy web services from both the OLS and BioPortal. An extensible code base means that other ontology resource (should they become available) can be added at any point.

Browse ChEBI from OLS

Browsing OBI (Ontology for Biomedical Investigations) from BioPortal

Search within ontologies

Search within ontologies

Users can search within ontologies and then locate the ontology in the entire ontology hierarchy. This functionality is provided to aid curators in finding the appropriate branch of an ontology to restrict users to.

ISAcreator and how it makes ontology selection easier for users

ISAcreator is a tool used by the community who often don’t know much about ontologies or the minimum information required to describe an experiment. The ISAcreator configurator produces XML describing the minimum information and the ontologies to use so that users need not worry about what to enter.

ISAcreator’s main menu

The user interface needs to be easy to use, and ontology selection should be seamless. The user needs not know which ontology resource they are browsing, but ontologies should be presented to them in a way which makes selection straightforward. ISAcreator achieves this by automatically prompting users for ontology terms whenever they are set to be required by the ISAcreator configurator.

Users are automatically prompted for Ontology terms

Users are presented with the full search result from both OLS and Bioportal in a standard form.

In summary, the functionalities inside both ISAcreator and it’s associated configurator make browsing and searching both OLS and BioPortal much simpler for users! An API with many advanced functionalities, building upon BioPortal and OLS services will be released soon as well as a standalone ontology browser and searcher!

Download ISAcreator and the ISAcreator configurator now from the ISAtab sourceforge site!