Create a simple ISA descriptor

This example creates minimal metadata for a single study ISA descriptor with no assay declared.

It shows how to serialize (write) the ISA Model content to ISA-Tab and ISA-JSON formats.

# If executing the notebooks on `Google Colab`,uncomment the following command 
# and run it to install the required python libraries. Also, make the test datasets available.

# !pip install -r requirements.txt

Let’s get ready and import all the necessary components

from isatools.model import *
import datetime

1. Declaring key ISA objects: Investigation, Study, Protocols, Ontologies, Contacts metadata

# Creating the Investigation object 

investigation = Investigation()
investigation.identifier = "i1"
investigation.title = "My Simple ISA Investigation"
investigation.description = "We could alternatively use the class constructor's parameters to set some default " \
                                "values at the time of creation, however we want to demonstrate how to use the " \
                                "object's instance variables to set values."
investigation.submission_date = "2016-11-03"
investigation.public_release_date = "2016-11-03"

# Create an Study object and set some values. The Study must have a filename, otherwise when we serialize it
# to ISA-Tab we would not know where to write it. We must also attach the study to the investigation by adding it
# to the 'investigation' object's list of studies.

study = Study(filename="s_study.txt")
study.identifier = "s1"
study.title = "My ISA Study"
study.description = "Like with the Investigation, we could use the class constructor to set some default values, " \
                    "but have chosen to demonstrate in this example the use of instance variables to set initial " \
study.submission_date = "2016-11-03"
study.public_release_date = "2016-11-03"


1.1 Declaring and using Ontology Resources:

# Declaration of Ontology Resources used to annotation the ISA objects

obi = OntologySource(name='OBI', description="Ontology for Biomedical Investigations")
# Some instance variables are typed with different objects and lists of objects. For example, a Study can have a
# list of design descriptors. A design descriptor is an Ontology Annotation describing the kind of study at hand.
# Ontology Annotations should typically reference an Ontology Source. We demonstrate a mix of using the class
# constructors and setting values with instance variables. Note that the OntologyAnnotation object
# 'intervention_design' links its 'term_source' directly to the 'obi' object instance. To ensure the OntologySource
# is encapsulated in the descriptor, it is added to a list of 'ontology_source_references' in the Investigation
# object. The 'intervention_design' object is then added to the list of 'design_descriptors' held by the Study
# object.

intervention_design = OntologyAnnotation(term_source=obi)
intervention_design.term = "intervention design"
intervention_design.term_accession = ""

1.2. Declaring Contacts and Publications

# Other instance variables common to both Investigation and Study objects include 'contacts' and 'publications',
# each with lists of corresponding Person and Publication objects.

contact = Person(first_name="Alice", last_name="Robertson", affiliation="University of Life", roles=[OntologyAnnotation(term='submitter')])
publication = Publication(title="Experiments with Elephants", author_list="A. Robertson, B. Robertson")
publication.pubmed_id = "12345678"
publication.status = OntologyAnnotation(term="published")

2. Building the ISA biomaterial creation graph

# To create the study graph that corresponds to the contents of the study table file (the s_*.txt file), we need
# to create a process sequence. To do this we use the Process class and attach it to the Study object's
# 'process_sequence' list instance variable. Each process must be linked with a Protocol object that is attached to
# a Study object's 'protocols' list instance variable. The sample collection Process object usually has as input
# a Source material and as output a Sample material.

# Here we create one Source material object and attach it to our study.

source = Source(name='source_material')

# Then we create three Sample objects, with organism as Homo Sapiens, and attach them to the study. We use the utility function
# batch_create_material() to clone a prototype material object. The function automatiaclly appends
# an index to the material name. In this case, three samples will be created, with the names
# 'sample_material-0', 'sample_material-1' and 'sample_material-2'.

prototype_sample = Sample(name='sample_material', derives_from=[source])

ncbitaxon = OntologySource(name='NCBITaxon', description="NCBI Taxonomy")
investigation.ontology_source_references.append(ncbitaxon) # remember to add the newly declared ontology source to the parent investigation

characteristic_organism = Characteristic(category=OntologyAnnotation(term="Organism"),
                                 value=OntologyAnnotation(term="Homo Sapiens", term_source=ncbitaxon,

study.samples = batch_create_materials(prototype_sample, n=3)  # creates a batch of 3 samples

# Now we create a single Protocol object that represents our sample collection protocol, and attach it to the
# study object. Protocols must be declared before we describe Processes, as a processing event of some sort
# must execute some defined protocol. In the case of the class model, Protocols should therefore be declared
# before Processes in order for the Process to be linked to one.

sample_collection_protocol = Protocol(name="sample collection",
                                      protocol_type=OntologyAnnotation(term="sample collection"))
sample_collection_process = Process(executes_protocol=sample_collection_protocol)

# Next, we link our materials to the Process. In this particular case, we are describing a sample collection
# process that takes one source material, and produces three different samples.
# (source_material)->(sample collection)->[(sample_material-0), (sample_material-1), (sample_material-2)]

for src in study.sources:
for sam in study.samples:

# Finally, attach the finished Process object to the study process_sequence. This can be done many times to
# describe multiple sample collection events.


#IMPORTANT: remember to list all Characteristics used in the study object: do as follows:

3. Creating an ISA Assay with all associated objects

# Next, we build n Assay object and attach two protocols, extraction and sequencing.

assay = Assay(filename="a_assay.txt")

extraction_protocol = Protocol(name='extraction', protocol_type=OntologyAnnotation(term="material extraction"))
sequencing_protocol = Protocol(name='sequencing', protocol_type=OntologyAnnotation(term="nucleic acid sequencing"))

# To build out assay graphs, we enumereate the samples from the study-level, and for each sample we create an
# extraction process and a sequencing process. The extraction process takes as input a sample material, and produces
# an extract material. The sequencing process takes the extract material and produces a data file. This will
# produce three graphs, from sample material through to data, as follows:
# (sample_material-0)->(extraction)->(extract-0)->(sequencing)->(sequenced-data-0)
# (sample_material-1)->(extraction)->(extract-1)->(sequencing)->(sequenced-data-1)
# (sample_material-2)->(extraction)->(extract-2)->(sequencing)->(sequenced-data-2)
# Note that the extraction processes and sequencing processes are distinctly separate instances, where the three
# graphs are NOT interconnected.

for i, sample in enumerate(study.samples):

    # create an extraction process that executes the extraction protocol

    extraction_process = Process(executes_protocol=extraction_protocol)

    # extraction process takes as input a sample, and produces an extract material as output

    material = Material(name="extract-{}".format(i))
    material.type = "Extract Name"

    # create a sequencing process that executes the sequencing protocol

    sequencing_process = Process(executes_protocol=sequencing_protocol) = "assay-name-{}".format(i)

    # Sequencing process usually has an output data file

    datafile = DataFile(filename="sequenced-data-{}".format(i), label="Raw Data File")

    # Ensure Processes are linked forward and backward. plink(from_process, to_process) is a function to set
    # these links for you. It is found in the isatools.model package

    plink(extraction_process, sequencing_process)

    # make sure the extract, data file, and the processes are attached to the assay

    assay.measurement_type = OntologyAnnotation(term="gene sequencing")
    assay.technology_type = OntologyAnnotation(term="nucleotide sequencing")

Finally, we attach the assay to the study.


4. Writing to ISA-Tab using isatab.dumps() function:

from isatools.isatab import dumps

5. Writing to ISA-JSON using isajson component

import json
from isatools.isajson import ISAJSONEncoder
print(json.dumps(investigation, cls=ISAJSONEncoder, sort_keys=True, indent=4, separators=(',', ': ')))