SBML annotation

Annotation of model components with meta-information is an important step during model building. Annotation is the process of adding metadata to the model and model components. These metadata are mostly from biological ontologies or biological databases.

sbmlutils provides functionality for annotating SBML models which can be used during model creation or later on to add annotations to SBML models. Annotations have the form of RDF triples consisting of the model component to annotate (subject), the relationship between model component and annotation term (predicate), and a term which describes the meaning of the component (object), which often comes from an ontology of defined terms.

The predicates come from a clearly defined set of predicates, the MIRIAM qualifiers (https://co.mbine.org/standards/qualifiers). Ideally the objects, i.e. annotations, are defined in an ontology which is registered at https://identifiers.org (see https://registry.identifiers.org/registry for available resources).

For more information of the importance of model annotations and best practises we refer to

Neal, M.L., König, M., Nickerson, D., Mısırlı, G., Kalbasi, R., Dräger, A., Atalag, K., Chelliah, V., Cooling, M.T., Cook, D.L. and Crook, S., 2019. Harmonizing semantic annotations for computational models in biology. Briefings in bioinformatics, 20(2), pp.540-550. 10.1093/bib/bby087

Le Novère, N., Finney, A., Hucka, M., Bhalla, U.S., Campagne, F., Collado-Vides, J., Crampin, E.J., Halstead, M., Klipp, E., Mendes, P. and Nielsen, P., 2005. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature biotechnology, 23(12), pp.1509-1515. https://www.nature.com/articles/nbt1156

Annotations in sbmlutils consist of associating (predictate, object) tuples to model components. For instance to describe that a species in the model is a certain entry from CHEBI, we associate (BQB.IS, "chebi/CHEBI:28061") with the species. In addition the special subset of annotations to the Systems Biology Ontology (SBO) can be directly set on all model components via the sboTerm attribute.

[1]:
%load_ext autoreload
%autoreload 2

Annotate during model creation

In the example the model is annotated during the model creation process. Annotations are encoded as simple tuples consisting of MIRIAM identifiers terms and identifiers.org parts. The list of tuples is provided on object creation. In the example we annotate a species

[2]:
from sbmlutils.factory import *
from sbmlutils.metadata import *
from sbmlutils.validation import validate_doc

model = Model(
    'example_annotation',
    compartments=[
        Compartment(sid="C", value=1.0, sboTerm=SBO.PHYSICAL_COMPARTMENT)
    ],
    species = [
        Species(sid='gal', compartment='C', initialConcentration=3.0,
                name='D-galactose', sboTerm=SBO.SIMPLE_CHEMICAL,
                annotations=[
                    (BQB.IS, "bigg.metabolite/gal"),  # galactose
                    (BQB.IS, "chebi/CHEBI:28061"),  # alpha-D-galactose
                    (BQB.IS, "vmhmetabolite/gal"),
                ]
        )
    ]
)

# create model and print SBML
doc = Document(model)
print(doc.get_sbml())

# validate model
validate_doc(doc.create_sbml(), options=ValidationOptions(units_consistency=False));
INFO     Create SBML for model 'example_annotation'                                                 factory.py:3526
WARNING  'name' should be set on 'Model(sid='example_annotation', packages=[<Package.COMP_V1:        factory.py:441
         'comp-v1'>], units=<class 'sbmlutils.factory.Units'>, compartments=[C = 1.0 [None]],                      
         species=[<sbmlutils.factory.Species object at 0x7f7eb84a8580>],                                           
         _FrozenClass__isfrozen=True)'                                                                             
WARNING  'name' should be set on 'Compartment(C, SBO.SBO_0000290)'                                   factory.py:441
<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:comp="http://www.sbml.org/sbml/level3/version1/comp/version1" level="3" version="1" comp:required="true">
  <notes>
    <body xmlns="http://www.w3.org/1999/xhtml">
      <p>Created with <a href="https://github.com/matthiaskoenig/sbmlutils">https://github.com/matthiaskoenig/sbmlutils</a>.
<a href="https://doi.org/10.5281/zenodo.5525390">
        <img src="https://zenodo.org/badge/DOI/10.5281/zenodo.5525390.svg" alt="DOI"/></a></p>
      </body>
    </notes>
  <model id="example_annotation">
    <listOfCompartments>
      <compartment metaid="meta_C" sboTerm="SBO:0000290" id="C" spatialDimensions="3" size="1" constant="true">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
            <rdf:Description rdf:about="#meta_C">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="http://identifiers.org/SBO:0000290"/>
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
      </compartment>
    </listOfCompartments>
    <listOfSpecies>
      <species metaid="meta_gal" sboTerm="SBO:0000247" id="gal" name="D-galactose" compartment="C" initialConcentration="3" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
            <rdf:Description rdf:about="#meta_gal">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="http://identifiers.org/SBO:0000247"/>
                  <rdf:li rdf:resource="http://identifiers.org/bigg.metabolite/gal"/>
                  <rdf:li rdf:resource="http://identifiers.org/CHEBI:28061"/>
                  <rdf:li rdf:resource="http://identifiers.org/vmhmetabolite/gal"/>
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
      </species>
    </listOfSpecies>
  </model>
</sbml>

INFO     Create SBML for model 'example_annotation'                                                 factory.py:3526
WARNING  'name' should be set on 'Model(sid='example_annotation', packages=[<Package.COMP_V1:        factory.py:441
         'comp-v1'>], units=<class 'sbmlutils.factory.Units'>, compartments=[C = 1.0 [None]],                      
         species=[<sbmlutils.factory.Species object at 0x7f7eb84a8580>],                                           
         _FrozenClass__isfrozen=True)'                                                                             
WARNING  'name' should be set on 'Compartment(C, SBO.SBO_0000290)'                                   factory.py:441
────────────────────────────────────────────────── Validate SBML ──────────────────────────────────────────────────
<SBMLDocument>
valid                    : TRUE
check time (s)           : 0.002
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────

For a more complete example see model_with_annotations.py which creates annotations of the form

Species(sid='e__gal', compartment='ext', initialConcentration=3.0,
            substanceUnit=UNIT_KIND_MOLE, boundaryCondition=True,
            name='D-galactose', sboTerm=SBO_SIMPLE_CHEMICAL,
            annotations=[
                (BQB.IS, "bigg.metabolite/gal"),  # galactose
                (BQB.IS, "chebi/CHEBI:28061"),  # alpha-D-galactose
                (BQB.IS, "vmhmetabolite/gal"),
            ]
        ),
[3]:
from notebook import BASE_DIR
from sbmlutils.factory import *
from sbmlutils.io import read_sbml
from model_with_annotations import model

results = create_model(
    model,
    filepath=BASE_DIR / 'models' / f"{model.sid}.xml"
)

# check the annotations on the species
import libsbml
doc: libsbml.SBMLDocument = read_sbml(results.sbml_path)
model: libsbml.Model = doc.getModel()
s1: libsbml.Species = model.getSpecies('e__gal')
print(s1.toSBML())
─────────────────────────────────────────────────── Create SBML ───────────────────────────────────────────────────
INFO     Create SBML for model 'annotation_example'                                                 factory.py:3526
WARNING  'sboTerm' should be set on 'Parameter(x_cell, cell diameter)'                               factory.py:466
WARNING  'sboTerm' should be set on 'Parameter(Vol_e, external volume)'                              factory.py:466
WARNING  'sboTerm' should be set on 'Parameter(A_m, membrane area)'                                  factory.py:466
WARNING  'name' should be set on 'InitialAssignment()'                                               factory.py:441
WARNING  'sboTerm' should be set on 'InitialAssignment()'                                            factory.py:466
WARNING  'name' should be set on 'InitialAssignment()'                                               factory.py:441
WARNING  'sboTerm' should be set on 'InitialAssignment()'                                            factory.py:466
WARNING  'name' should be set on 'Parameter(GLUT2_Vmax)'                                             factory.py:441
WARNING  'sboTerm' should be set on 'Parameter(GLUT2_Vmax)'                                          factory.py:466
WARNING  'name' should be set on 'Parameter(GLUT2_k_gal)'                                            factory.py:441
WARNING  'sboTerm' should be set on 'Parameter(GLUT2_k_gal)'                                         factory.py:466
WARNING  'name' should be set on 'Parameter(GLUT2_keq)'                                              factory.py:441
WARNING  'sboTerm' should be set on 'Parameter(GLUT2_keq)'                                           factory.py:466
WARNING  'name' should be set on 'Parameter(Vol_c)'                                                  factory.py:441
WARNING  'sboTerm' should be set on 'Parameter(Vol_c)'                                               factory.py:466
WARNING  'name' should be set on 'InitialAssignment()'                                               factory.py:441
WARNING  'sboTerm' should be set on 'InitialAssignment()'                                            factory.py:466
────────────────────────────────────────────────── Validate SBML ──────────────────────────────────────────────────
file:///home/mkoenig/git/sbmlutils/docs_builder/notebooks/models/annotation_example.xml
valid                    : TRUE
check time (s)           : 0.013
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
<species metaid="meta_e__gal" sboTerm="SBO:0000247" id="e__gal" name="D-galactose" compartment="ext" initialConcentration="3" substanceUnits="mole" hasOnlySubstanceUnits="false" boundaryCondition="true" constant="false">
  <annotation>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
      <rdf:Description rdf:about="#meta_e__gal">
        <bqbiol:is>
          <rdf:Bag>
            <rdf:li rdf:resource="http://identifiers.org/SBO:0000247"/>
            <rdf:li rdf:resource="http://identifiers.org/bigg.metabolite/gal"/>
            <rdf:li rdf:resource="http://identifiers.org/CHEBI:28061"/>
            <rdf:li rdf:resource="http://identifiers.org/vmhmetabolite/gal"/>
          </rdf:Bag>
        </bqbiol:is>
      </rdf:Description>
    </rdf:RDF>
  </annotation>
</species>

Annotate existing model

An alternative approach is to annotate existing models from external annotation files. For instance we can define the annotations in an external file which we then add to the model based on identifier matching. The following annotations are written to the ./annotations/demo.xml based on pattern matching.

Annotations are written for the given sbml_type for all SBML identifiers which match the given pattern.

[4]:
from sbmlutils.metadata.annotator import ModelAnnotator
df = ModelAnnotator.read_annotations_df(BASE_DIR / 'annotations' / 'demo_annotations.xlsx', file_format="xlsx")
df
[4]:
pattern sbml_type annotation_type qualifier resource name
0 NaN document rdf BQM_IS sbo/SBO:0000293 non-spatial continuous framework
1 ^demo_\d+$ model rdf BQM_IS go/GO:0008152 metabolic process
4 e compartment rdf BQB_IS sbo/SBO:0000290 physical compartment
5 e compartment rdf BQB_IS go/GO:0005615 extracellular space
6 e compartment rdf BQB_IS fma/FMA:70022 extracellular space
8 m compartment rdf BQB_IS sbo/SBO:0000290 physical compartment
9 m compartment rdf BQB_IS go/GO:0005886 plasma membrane
10 m compartment rdf BQB_IS fma/FMA:63841 plasma membrane
12 c compartment rdf BQB_IS sbo/SBO:0000290 physical compartment
13 c compartment rdf BQB_IS go/GO:0005623 cell
14 c compartment rdf BQB_IS fma/FMA:68646 cell
17 ^Km_\w+$ parameter rdf BQB_IS sbo/SBO:0000027 Michaelis constant
18 ^Keq_\w+$ parameter rdf BQB_IS sbo/SBO:0000281 equilibrium constant
19 ^Vmax_\w+$ parameter rdf BQB_IS sbo/SBO:0000186 maximal velocity
22 ^\w{1}__A$ species rdf BQB_IS sbo/SBO:0000247 simple chemical
23 ^\w{1}__B$ species rdf BQB_IS sbo/SBO:0000247 simple chemical
24 ^\w{1}__C$ species rdf BQB_IS sbo/SBO:0000247 simple chemical
25 ^\w{1}__\w+$ species formula NaN C6H12O6 NaN
26 ^\w{1}__\w+$ species charge NaN 0 NaN
28 ^b\w{1}$ reaction rdf BQB_IS sbo/SBO:0000185 transport reaction
29 ^v\w{1}$ reaction rdf BQB_IS sbo/SBO:0000176 biochemical reaction
[5]:
from sbmlutils.metadata.annotator import annotate_sbml

# create SBML report without performing units checks
doc = annotate_sbml(
    source=BASE_DIR / 'annotations' / 'demo.xml',
    annotations_path=BASE_DIR / 'annotations' / 'demo_annotations.xlsx',
    filepath=BASE_DIR / 'annotations' / 'demo_annotated.xml'
)
print(doc.getModel())
WARNING  No SBML objects found matching SId annotation pattern: '^demo_\d+$'                       annotator.py:214
Model annotated: file:///home/mkoenig/git/sbmlutils/docs_builder/notebooks/annotations/demo_annotated.xml
<Model Koenig_demo_14 "Koenig_demo_14">
[ ]: