SBML annotation¶
Annotation of model components with meta-information is an important step during model building. Annotation is the process of adding metadata to the model and model components. These metadata are mostly from biological ontologies or biological databases.
sbmlutils
provides functionality for annotating SBML models which can be used during model creation or later on to add annotations to SBML models. Annotations have the form of RDF triples consisting of the model component to annotate (subject), the relationship between model component and annotation term (predicate), and a term which describes the meaning of the component (object), which often comes from an ontology of defined terms.
The predicates come from a clearly defined set of predicates, the MIRIAM qualifiers (https://co.mbine.org/standards/qualifiers). Ideally the objects, i.e. annotations, are defined in an ontology which is registered at https://identifiers.org (see https://registry.identifiers.org/registry for available resources).
For more information of the importance of model annotations and best practises we refer to
Neal, M.L., König, M., Nickerson, D., Mısırlı, G., Kalbasi, R., Dräger, A., Atalag, K., Chelliah, V., Cooling, M.T., Cook, D.L. and Crook, S., 2019. Harmonizing semantic annotations for computational models in biology. Briefings in bioinformatics, 20(2), pp.540-550. 10.1093/bib/bby087
Le Novère, N., Finney, A., Hucka, M., Bhalla, U.S., Campagne, F., Collado-Vides, J., Crampin, E.J., Halstead, M., Klipp, E., Mendes, P. and Nielsen, P., 2005. Minimum information requested in the annotation of biochemical models (MIRIAM). Nature biotechnology, 23(12), pp.1509-1515. https://www.nature.com/articles/nbt1156
Annotations in sbmlutils
consist of associating (predictate, object)
tuples to model components. For instance to describe that a species
in the model is a certain entry from CHEBI, we associate (BQB.IS, "chebi/CHEBI:28061")
with the species. In addition the special subset of annotations to the Systems Biology Ontology (SBO) can be directly set on all model components via the sboTerm
attribute.
[1]:
%load_ext autoreload
%autoreload 2
Annotate during model creation¶
In the example the model is annotated during the model creation process. Annotations are encoded as simple tuples consisting of MIRIAM identifiers terms and identifiers.org parts. The list of tuples is provided on object creation. In the example we annotate a species
[2]:
from sbmlutils.factory import *
from sbmlutils.metadata import *
from sbmlutils.validation import validate_doc
model = Model(
'example_annotation',
compartments=[
Compartment(sid="C", value=1.0, sboTerm=SBO.PHYSICAL_COMPARTMENT)
],
species = [
Species(sid='gal', compartment='C', initialConcentration=3.0,
name='D-galactose', sboTerm=SBO.SIMPLE_CHEMICAL,
annotations=[
(BQB.IS, "bigg.metabolite/gal"), # galactose
(BQB.IS, "chebi/CHEBI:28061"), # alpha-D-galactose
(BQB.IS, "vmhmetabolite/gal"),
]
)
]
)
# create model and print SBML
doc = Document(model)
print(doc.get_sbml())
# validate model
validate_doc(doc.create_sbml(), options=ValidationOptions(units_consistency=False));
INFO Create SBML for model 'example_annotation' factory.py:3526
WARNING 'name' should be set on 'Model(sid='example_annotation', packages=[<Package.COMP_V1: factory.py:441 'comp-v1'>], units=<class 'sbmlutils.factory.Units'>, compartments=[C = 1.0 [None]], species=[<sbmlutils.factory.Species object at 0x7f7eb84a8580>], _FrozenClass__isfrozen=True)'
WARNING 'name' should be set on 'Compartment(C, SBO.SBO_0000290)' factory.py:441
<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="http://www.sbml.org/sbml/level3/version1/core" xmlns:comp="http://www.sbml.org/sbml/level3/version1/comp/version1" level="3" version="1" comp:required="true">
<notes>
<body xmlns="http://www.w3.org/1999/xhtml">
<p>Created with <a href="https://github.com/matthiaskoenig/sbmlutils">https://github.com/matthiaskoenig/sbmlutils</a>.
<a href="https://doi.org/10.5281/zenodo.5525390">
<img src="https://zenodo.org/badge/DOI/10.5281/zenodo.5525390.svg" alt="DOI"/></a></p>
</body>
</notes>
<model id="example_annotation">
<listOfCompartments>
<compartment metaid="meta_C" sboTerm="SBO:0000290" id="C" spatialDimensions="3" size="1" constant="true">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#meta_C">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/SBO:0000290"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
</compartment>
</listOfCompartments>
<listOfSpecies>
<species metaid="meta_gal" sboTerm="SBO:0000247" id="gal" name="D-galactose" compartment="C" initialConcentration="3" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#meta_gal">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/SBO:0000247"/>
<rdf:li rdf:resource="http://identifiers.org/bigg.metabolite/gal"/>
<rdf:li rdf:resource="http://identifiers.org/CHEBI:28061"/>
<rdf:li rdf:resource="http://identifiers.org/vmhmetabolite/gal"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
</species>
</listOfSpecies>
</model>
</sbml>
INFO Create SBML for model 'example_annotation' factory.py:3526
WARNING 'name' should be set on 'Model(sid='example_annotation', packages=[<Package.COMP_V1: factory.py:441 'comp-v1'>], units=<class 'sbmlutils.factory.Units'>, compartments=[C = 1.0 [None]], species=[<sbmlutils.factory.Species object at 0x7f7eb84a8580>], _FrozenClass__isfrozen=True)'
WARNING 'name' should be set on 'Compartment(C, SBO.SBO_0000290)' factory.py:441
────────────────────────────────────────────────── Validate SBML ──────────────────────────────────────────────────
<SBMLDocument> valid : TRUE check time (s) : 0.002
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
For a more complete example see model_with_annotations.py which creates annotations of the form
Species(sid='e__gal', compartment='ext', initialConcentration=3.0,
substanceUnit=UNIT_KIND_MOLE, boundaryCondition=True,
name='D-galactose', sboTerm=SBO_SIMPLE_CHEMICAL,
annotations=[
(BQB.IS, "bigg.metabolite/gal"), # galactose
(BQB.IS, "chebi/CHEBI:28061"), # alpha-D-galactose
(BQB.IS, "vmhmetabolite/gal"),
]
),
[3]:
from notebook import BASE_DIR
from sbmlutils.factory import *
from sbmlutils.io import read_sbml
from model_with_annotations import model
results = create_model(
model,
filepath=BASE_DIR / 'models' / f"{model.sid}.xml"
)
# check the annotations on the species
import libsbml
doc: libsbml.SBMLDocument = read_sbml(results.sbml_path)
model: libsbml.Model = doc.getModel()
s1: libsbml.Species = model.getSpecies('e__gal')
print(s1.toSBML())
─────────────────────────────────────────────────── Create SBML ───────────────────────────────────────────────────
INFO Create SBML for model 'annotation_example' factory.py:3526
WARNING 'sboTerm' should be set on 'Parameter(x_cell, cell diameter)' factory.py:466
WARNING 'sboTerm' should be set on 'Parameter(Vol_e, external volume)' factory.py:466
WARNING 'sboTerm' should be set on 'Parameter(A_m, membrane area)' factory.py:466
WARNING 'name' should be set on 'InitialAssignment()' factory.py:441
WARNING 'sboTerm' should be set on 'InitialAssignment()' factory.py:466
WARNING 'name' should be set on 'InitialAssignment()' factory.py:441
WARNING 'sboTerm' should be set on 'InitialAssignment()' factory.py:466
WARNING 'name' should be set on 'Parameter(GLUT2_Vmax)' factory.py:441
WARNING 'sboTerm' should be set on 'Parameter(GLUT2_Vmax)' factory.py:466
WARNING 'name' should be set on 'Parameter(GLUT2_k_gal)' factory.py:441
WARNING 'sboTerm' should be set on 'Parameter(GLUT2_k_gal)' factory.py:466
WARNING 'name' should be set on 'Parameter(GLUT2_keq)' factory.py:441
WARNING 'sboTerm' should be set on 'Parameter(GLUT2_keq)' factory.py:466
WARNING 'name' should be set on 'Parameter(Vol_c)' factory.py:441
WARNING 'sboTerm' should be set on 'Parameter(Vol_c)' factory.py:466
WARNING 'name' should be set on 'InitialAssignment()' factory.py:441
WARNING 'sboTerm' should be set on 'InitialAssignment()' factory.py:466
────────────────────────────────────────────────── Validate SBML ──────────────────────────────────────────────────
file:///home/mkoenig/git/sbmlutils/docs_builder/notebooks/models/annotation_example.xml valid : TRUE check time (s) : 0.013
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────
<species metaid="meta_e__gal" sboTerm="SBO:0000247" id="e__gal" name="D-galactose" compartment="ext" initialConcentration="3" substanceUnits="mole" hasOnlySubstanceUnits="false" boundaryCondition="true" constant="false">
<annotation>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
<rdf:Description rdf:about="#meta_e__gal">
<bqbiol:is>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/SBO:0000247"/>
<rdf:li rdf:resource="http://identifiers.org/bigg.metabolite/gal"/>
<rdf:li rdf:resource="http://identifiers.org/CHEBI:28061"/>
<rdf:li rdf:resource="http://identifiers.org/vmhmetabolite/gal"/>
</rdf:Bag>
</bqbiol:is>
</rdf:Description>
</rdf:RDF>
</annotation>
</species>
Annotate existing model¶
An alternative approach is to annotate existing models from external annotation files. For instance we can define the annotations in an external file which we then add to the model based on identifier matching. The following annotations are written to the ./annotations/demo.xml based on pattern matching.
Annotations are written for the given sbml_type
for all SBML identifiers which match the given pattern.
[4]:
from sbmlutils.metadata.annotator import ModelAnnotator
df = ModelAnnotator.read_annotations_df(BASE_DIR / 'annotations' / 'demo_annotations.xlsx', file_format="xlsx")
df
[4]:
pattern | sbml_type | annotation_type | qualifier | resource | name | |
---|---|---|---|---|---|---|
0 | NaN | document | rdf | BQM_IS | sbo/SBO:0000293 | non-spatial continuous framework |
1 | ^demo_\d+$ | model | rdf | BQM_IS | go/GO:0008152 | metabolic process |
4 | e | compartment | rdf | BQB_IS | sbo/SBO:0000290 | physical compartment |
5 | e | compartment | rdf | BQB_IS | go/GO:0005615 | extracellular space |
6 | e | compartment | rdf | BQB_IS | fma/FMA:70022 | extracellular space |
8 | m | compartment | rdf | BQB_IS | sbo/SBO:0000290 | physical compartment |
9 | m | compartment | rdf | BQB_IS | go/GO:0005886 | plasma membrane |
10 | m | compartment | rdf | BQB_IS | fma/FMA:63841 | plasma membrane |
12 | c | compartment | rdf | BQB_IS | sbo/SBO:0000290 | physical compartment |
13 | c | compartment | rdf | BQB_IS | go/GO:0005623 | cell |
14 | c | compartment | rdf | BQB_IS | fma/FMA:68646 | cell |
17 | ^Km_\w+$ | parameter | rdf | BQB_IS | sbo/SBO:0000027 | Michaelis constant |
18 | ^Keq_\w+$ | parameter | rdf | BQB_IS | sbo/SBO:0000281 | equilibrium constant |
19 | ^Vmax_\w+$ | parameter | rdf | BQB_IS | sbo/SBO:0000186 | maximal velocity |
22 | ^\w{1}__A$ | species | rdf | BQB_IS | sbo/SBO:0000247 | simple chemical |
23 | ^\w{1}__B$ | species | rdf | BQB_IS | sbo/SBO:0000247 | simple chemical |
24 | ^\w{1}__C$ | species | rdf | BQB_IS | sbo/SBO:0000247 | simple chemical |
25 | ^\w{1}__\w+$ | species | formula | NaN | C6H12O6 | NaN |
26 | ^\w{1}__\w+$ | species | charge | NaN | 0 | NaN |
28 | ^b\w{1}$ | reaction | rdf | BQB_IS | sbo/SBO:0000185 | transport reaction |
29 | ^v\w{1}$ | reaction | rdf | BQB_IS | sbo/SBO:0000176 | biochemical reaction |
[5]:
from sbmlutils.metadata.annotator import annotate_sbml
# create SBML report without performing units checks
doc = annotate_sbml(
source=BASE_DIR / 'annotations' / 'demo.xml',
annotations_path=BASE_DIR / 'annotations' / 'demo_annotations.xlsx',
filepath=BASE_DIR / 'annotations' / 'demo_annotated.xml'
)
print(doc.getModel())
WARNING No SBML objects found matching SId annotation pattern: '^demo_\d+$' annotator.py:214
Model annotated: file:///home/mkoenig/git/sbmlutils/docs_builder/notebooks/annotations/demo_annotated.xml
<Model Koenig_demo_14 "Koenig_demo_14">
[ ]: