Author Archives: agbeltran

Join the funFAIR!

Today, March 15 2016, the FAIR Guiding Principles for scientific data management and stewardship were formally published in the Nature Publishing Group journal Scientific Data. The problem the FAIR Principles address is the lack of widely shared, clearly articulated, and broadly applicable best practices around the publication of scientific data. While the history of scholarly publication in journals is long and well established, the same cannot be said of formal data publication. Yet, data could be considered the primary output of scientific research, and its publication and reuse is necessary to ensure validity, reproducibility, and to drive further discoveries. The FAIR Principles address these needs by providing a precise and measurable set of qualities a good data publication should exhibit – qualities that ensure that the data is Findable, Accessible, Interoperable, and Reusable (FAIR).

The ISA infrastructure project and BioSharing registry of standards, databases and policies are both part of this community in which we strive to make data FAIR. Do join us in these efforts!

For more information, read the paper and see the press release at the Dutch Tech Centre for Life Sciences. scidata fair

The ISA team is growing!

We are very happy to announced that Dr David Johnson and Dr Massimiliano Izzo have joined the ISA team as a Research Software Engineers, last year and this year, respectively.

David and Massi are both great additions to the team. A few words about their past experience…

David

David

David completed his PhD at the University of Reading (UK) and before joining us at the University of Oxford e-Research Centre (OeRC), he worked at Imperial College London where he was a founding member of the Data Science Institute. Prior to that he worked in the Department of Computer Science at Oxford University, where he was part of an FP7 project developing interoperable cancer model databases, and also in the Evolutionary Biology Group at the University of Reading where he developed high-performance computing software for phylogenetics. He serves on the technical programme committees of a number of international conferences including the International Conference on Computational Science series and on the editorial board of the journal Cancer Informatics.

Massi

Massi

Massi completed PhD studies in Biomedical Engineering at the University of Genoa (Italy). His main interests are in the design and development of innovative data models for Life Sciences, structured/unstructured data management and full-stack software development (JavaScript all the way!). Before joining the OeRC, he was a Research Collaborator at the Giannini Gaslini Institute, in Genoa (Italy) where he developed distributed data management systems for Integrated Biobanking Management, mostly targeted to Paediatric Tumours. In his free time, Massi enjoys reading (mostly speculative fiction novels), gazing at the ceiling while lying on the sofa, and wander aimlessly in bookshops and cafes.

You can follow all their code contributions to ISA-tools through their Github profiles: djcomlab and zigur.

OntoMaton Add-on for Google Sheets

ontomaton-fig5

This is a very delayed blog post about the OntoMaton Add-on version we released earlier in the year. But better late than never: here we describe the new features we incorporated in the latest OntoMaton version.

OntoMaton is a widget bringing together ontology lookup and tagging within the collaborative environment provided by Google Spreadsheets. The original motivation for creating OntoMaton was to support users to create well-annotated experimental metadata in biosciences in a collaborative way, while keeping track of different versions. Google Spreadsheets provide such facilities for collaboration and versioning, so we combined them with ontology search and tagging functionality offered by the NCBO BioPortal web services. BioPortal is a web-based repository for biomedical ontologies/terminologies with functionality  for searching and visualizing the ontologies and support ontology-based annotations.

For more information about OntoMaton and its motiviation, see our publication in Bioinformatics and our previous blog posts.

After our initial version of OntoMaton,

Consequently, we upgraded OntoMaton to the latest versions of these services.

We also took the opportunity to incorporate searches across the Linked Open Vocabularies repository. Linked Open Vocabularies (LOV) is a repository of (RDFS or OWL) vocabularies used in the Linked Data Cloud, and thus, not restricted to bio-ontologies. This addition allows OntoMaton to be used for other use cases, relying on vocabularies outside the bio-domain.

For this new version, the installation procedure with Google Add-ons is as follows:

  • Open a Google Spreadsheet and select the Add-ons menu

ontomaton-fig1

  • Select Get add-ons, search for OntoMaton

ontomaton-fig2

  • By clicking on OntoMaton, you can find more information about it, including some screenshots.

ontomaton-fig8

  • Then you can install OntoMaton (by clicking over the Free button) or if it is already installed, you can manage the installation by clicking on the Manage button.
  • You will need to authorise OntoMaton to view and manage your spreadsheets (as the Add-on will search over terms from your spreadsheets and incorporate links, etc) and connect to an external service (the REST services that OntoMaton relies upon)

ontomaton-fig3

  •  After that, you will be able to use OntoMaton functionality, accessible from the Add-ons menu

ontomaton-fig4

  • And that’s it! You can start using OntoMaton for searching and tagging… The functionality is as before, except that when searching you need to select if you want to search BioPortal or LOV.

ontomaton-fig6

ontomaton-fig7

In the Bioinformatics publication, we shown some of the use cases for OntoMaton. More recently, OntoMaton has been:

If you are interested in the OntoMaton source code, you can find it in its GitHub repository.

Finally, if you have questions or comments about OntoMaton, contact us (the ISA team) at isatools <AT> googlegroups.com (replacing <AT> for @!). We would love to hear about how you are using OntoMaton!

OntoMaton at the International Conference on Biomedical Ontology (ICBO)

The 4th International Conference on Biomedical Ontology (ICBO 2013) took place in Montreal, Canada, on 8th and 9th July 2013. It was held jointly with the Canadian Semantic Web Symposium (CSWS 2013) and Data Integration in the Life Sciences (DILS 2013) in what was called the Semantic Trilogy 2013.

The members of the core ISA team could not be present this time, but our collaborator and co-author Trish Whetzel was there to present OntoMaton in the Highlight Track entitled OntoMaton: Google spreadsheets meet NCBO BioPortal services. We take this opportunity to thank Trish for her presentation, which was very well received. We have had many new users from the Biomedical Ontology community since ICBO.

The presentation was about our publication “OntoMaton: a Bioportal powered ontology widget for Google Spreadsheets” available (open access) in Bioinformatics. 2013 February 15; 29(4): 525–527.

OntoMaton is a script for Google Spreadsheets that relies on NCBO Bioportal Web Services to provide searching and tagging with ontology terms, using those ontologies registered in Bioportal. These functionality for ontology-based annotation of spreadsheets can be used collaboratively among distributed parties. Also, Google Spreadsheets provide version control.

This is the link to the publication in Bioinformatics: http://bioinformatics.oxfordjournals.org/content/29/4/525 and this is in PubMed: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3570217/.

And the slides that Trish presented are here:

If you are interested in OntoMaton, apart from the publication and slides, you can also see our previous blog postour YouTube video, some templates and the source code on Github.
Thanks again Trish for presenting OntoMaton at ICBO!

ISAcreator available in GenomeSpace

ISAcreator has supported access to GenomeSpace since release 1.7.0. and it is now also available through the GenomeSpace online interface.

GenomeSpace is a framework supporting cloud-based interoperability of genomics analysis tools. By providing access to multiple tools through their interface, and supporting file transfers in the cloud, GenomeSpace provides a bridge among the tools, allowing users    Some of the tools available through GenomeSpace are: CytoscapeGalaxyGenePatternGenomicaIntegrative Genomics Viewer (IGV), and the UCSC Genome Table Browser. Find out more about what is GenomeSpace and what GenomeSpace can do for you.

I will now describe the functionality ISAcreator supports for GenomeSpace.

You can launch ISAcreator from your desktop or you can launch it once you are logged in to GenomeSpace (after registering to their service). For launching ISAcreator from within GenomeSpace, just hover over the ISAcreator icon and select ‘Launch’:

GS-8

If you launch ISAcreator from GenomeSpace, you will be prompted to download ISAcreator and will see the following pop-up window (after you accept to download the file):

GS-10

When running ISAcreator (either from your desktop or following the GenomeSpace route), you will notice that it now has a third mode of operation (apart from the previously available light and normal modes) that corresponds to GenomeSpace. With this third method of operation, ISAcreator supports opening ISA-TAB files stored on the cloud environment provided by GenomeSpace and also, saving files into GenomeSpace storage facilities.

GS-7

If you choose the GenomeSpace mode, you will have to enter your GS user credentials in the ISAcreator login page:

GS-3

Then you will load the configuration files, as usual, and get to the main menu where you can choose to load an existing ISA-TAB file. If it is not the first time you are loading files, you will see the previously loaded files and also have the option to search GenomeSpace for more files:

GS-11

As an example, you can find the publicly available BII-I-1 ISA-TAB dataset in GenomeSpace under Public/agbeltran/ISA-TAB-datasets, and select it to load:

GS-12

After loading an ISA-TAB dataset, you can save it to GenomeSpace (even if it is a local dataset that you want to store in GenomeSpace):

GS-13

GenomeSpace also provides documentation about ISAcreator in this page and a guide about using ISAcreator in this other page.

As always, send us comments or questions contacting:

  • the ISA team at isatools [at] googlegroups [dot] com, 
  • the ISA user forum at isaforum [at] googlegroups [dot] com

or send us feature requests or bug reports through the issue tracker in Github:

  • https://github.com/ISA-tools/ISAcreator/issues

New paper on the Metabolights database!

A recent Nucleic Acids Research (NAR) paper features Metaboligths: the first general-purpose, open access repository for metabolomics studies, storing the studies raw experimental data and associated metadata. You can find the paper here. Metabolights is powered by the the open source ISA metadata framework, and thus is part of the ISA commons, and it is deployed at the European Bioinformatics Institute (EBI).

Some of the characteristics of the database are: it is cross-species and cross-techniques, it covers metabolite structures and their reference spectra, their biological roles, locations, concentrations, and raw data from metabolic experiments.

The paper describes the database content and technical architecture, the functionalities for searching and browsing the data, how to download data and get programmatic access to the database, and the data submission process, among other things.

Risa is out! And available in the Bioconductor 2.11 release…

The ISA infrastructure now includes an R package, called Risa, to parse ISAtab datasets into R objects. The Risa package is included in BioConductor release 2.11.

Risa provides functionality to read ISA-Tab datasets into R objects. These objects can then be used by downstream Bioconductor packages for data analysis and visualization (i.e, xcms). Currently, metadata associated to proteomics and metabolomics-based assays (i.e. mass spectrometry) can be processed into xcmsSet objects (from the xcms Bioconductor package). Risa also provides functionality to save the ISA-tab dataset, or each of its individual files, and to update assay files after analysis. For an example of using Risa for processing and analysis of metabolomics data, see https://github.com/sneumann/mtbls2.

The source code and latest version can be found in the GitHub repository https://github.com/ISA-tools/Risa. Please, submit all ‘bugs’ and feature requests through https://github.com/ISA-tools/Risa/issues.