jpinneyimperialacuk Gene annotation G oal transfer knowledge about the function of gene products from model organisms to other genomes Gene annotation ID: 629583
Download Presentation The PPT/PDF document "Gene Ontology John Pinney" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Gene Ontology
John Pinney
j.pinney@imperial.ac.ukSlide2
Gene annotation
G
oal:
transfer
knowledge
about the function of gene products from model
organisms
to other genomes
Slide3
Gene annotation
Problem:
keyword systems are different
between research communities
Slide4
Gene annotation
Solution:
controlled vocabulary
Slide5
Ontology
structured
controlled vocabulary
Slide6
Ontology:
a collection of
terms
and their
definitions
and
the
logical relationships
between them
Slide7
Gene Ontology (GO):
a collection of
terms
and their
definitions
and
the
logical relationships
between them
describing gene productsSlide8
nucleus
“A
membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell's chromosomes except the
organellar
chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent
.”
GO:0005634Slide9
nucleus
cell
nuclear membrane
nucleoplasm
nucleolus
“part of”Slide10
nucleus
intracellular membrane-bounded organelle
pronucleus
intracellular organelle
“is a”
membrane-bounded organelleSlide11
A term may have
more than one parent term
and
more than one child term.
=>
The gene ontology is
not a tree
Slide12
The gene ontology has a structure known as a
Directed Acyclic Graph
(
DAG
).
relationships are not symmetrical
there are no directed loops
mathematical term for a networkSlide13
GO is actually made up of 3 different
ontologies:
cellular component
molecular function
biological process
Slide14
cellular component
“The part of a cell or its extracellular environment in which a gene product is located. A gene product may be located in one or more parts of a cell.”Slide15
cellular component
examples:cohesin core heterodimerextracellular
region
laminin-1
complex
replication
fork
transcription factor complexSlide16
molecular function
“Elemental activities, such as catalysis or binding, describing the actions of a gene product at the molecular level. A given gene product may exhibit one or more molecular functions.”Slide17
molecular function
examples:transcription factor bindingenzyme activator activity
3'-nucleotidase
activity
metallopeptidase
activity
hexokinase activitySlide18
biological process
“Those processes specifically pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. A process is a collection of molecular events with a defined beginning and end.”Slide19
biological
processexamples:
para
-aminobenzoic
acid biosynthetic
process
protein
localization
establishment of blood-nerve barrier
circadian rhythmposterior midgut developmentSlide20
geneontology.org
Slide21
geneontology.org
search and browse the ontologiesSlide22
geneontology.org
search and browse the ontologiesSlide23
geneontology.org
download ontologies Slide24
geneontology.org
download mappings from other databases
enzyme functions
(EC,
KEGG,
MetaCyc
)
protein domains
(
Pfam
, SMART, PRINTS,…)
other
controlled vocabularies
of functions
(E. coli functions, MIPS
FunCat
)Slide25
geneontology.org
download annotations for various genomes
Slide26
NCBI_NP
NP_354299.2
lolD
GO:0043190
ISS
"ABC transporter, nucleotide binding/ATPase
protein
(lipoprotein)" taxon:176299 20070612 PAMGO_GAT
geneontology.org
download annotations for various genomes
database
gene product ID
gene symbol
GO term ID
evidence codeSlide27
evidence codes
Allow curators to indicate the type of evidence for each gene-term annotation.
experimental
computational
a
uthor statement
e.g.
IMP
Inferred from mutant phenotype
IDA
Inferred from direct assay
e.g.
ISS
Inferred from sequence similarity
IGC
Inferred from genome context
e.g.
TAS
Traceable author statementSlide28
NCBI_NP
NP_354299.2
lolD
GO:0043190
ISS
"ABC transporter, nucleotide binding/ATPase
protein
(lipoprotein)" taxon:176299 20070612 PAMGO_GAT
geneontology.org
download annotations for various genomes
database
gene product ID
gene symbol
GO term ID
evidence code
description
organism (taxon) ID
date
annotation project IDSlide29
geneontology.org
repository of analysis tools that use GO
search, edit and and browse ontologies / annotations
software libraries
statistical analysis
text mining
protein interactions
enrichment analysisSlide30
Enrichment analysis Slide31
significant expression change in a microarray experiment
cluster from a protein interaction network
some other experiment / analysis
gene set
whole genome (annotated)
W
hich GO terms occur significantly more often than expected in this gene set?
BiNGO
GOstat
ArrayTrackSlide32
Advantages of GO
single set of terms to describe the function of gene products from all organisms.
DAG structure provides a logical framework to represent knowledge at whatever level of detail is available.
continually revised to reflect
the state of current
knowledge.
can quantify strength of relationships between terms (semantic similarity).
many statistical analysis tools available.Slide33
Limitations of GO
GO is limited in scope: it does not cover
processes that are not normal functions of gene products (e.g.
oncogenesis
).
sequence attributes (e.g. introns/exons)
protein structures or interactions
evolution
gene expression
Slide34
Summary (1)
The gene ontology (GO) is a structured, controlled vocabulary to describe the function of gene products.
Terms in GO have logical relationships (“is a”, “part of”) with one another. Together these form a structure called a Directed Acyclic Graph (DAG).
GO is formed of 3 separate ontologies describing different aspects of gene function: cellular component, molecular function and biological process.Slide35
Summary (2)
geneontology.org
is the central resource for downloading ontology, annotation and mapping files.
evidence codes are used in annotations to show the experimental, computational or literature support for each function.Slide36
Summary (3)
many software tools are available to support GO analysis of experimental data, including enrichment analysis by
ArrayTrack
(microarray expression data)
BiNGO
(protein interaction clusters)
GOstat
(any data in the form of gene sets)