13 th June 2012 Rotterdam Netherlands Duncan Legge Introduction to the Gene Ontology and GO Annotation Resources OUTLINE OF TUTORIAL PART I Ontologies and the Gene Ontology GO PART II GO Annotations ID: 926617
Download Presentation The PPT/PDF document "EBI Bioinformatics Roadshow" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
EBI Bioinformatics Roadshow13th June 2012Rotterdam, NetherlandsDuncan Legge
Introduction to the Gene Ontology and GO Annotation Resources
Slide2OUTLINE OF TUTORIAL:PART I: Ontologies and the Gene Ontology (GO)PART II: GO Annotations How to access GO annotations How scientists use GO annotations
Slide3PART I: Gene Ontology
Slide4What does an ontology provide?
1. Consistent terminology – controlled vocabulary.
2. Relationships between terms – hierarchy.
Slide5Controlled vocabulary
Q: What is a cell?
A: It really depends who you ask!
Slide6Different things can be described by the same name
Slide7Glucose synthesisGlucose biosynthesisGlucose formationGlucose anabolismGluconeogenesis
The same thing can be described by different names:
Slide8Inconsistency in naming of biological conceptsSame name for different conceptsDifferent names for the same concept
Comparison is difficult – in particular across species or across databases
Just one reason why the Gene Ontology (GO) is is needed…
Slide9Why do we need GO?
Large datasets need to be interpreted quickly
Inconsistency in naming of biological concepts
Increasing amounts of biological data available
Increasing amounts of biological data to come
Slide10Increasing amounts of biological data availableSearch on mesoderm development…. you get 9441 results!
Expansion of sequence information
Slide11What is an ontology?Dictionary: A branch of metaphysics concerned with the nature and relations of being (philosophy)A formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts (computer science)Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.
1606
1700s
Slide12What is an ontology?More usefully: An ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.
is part of
Slide13What’s in an Ontology?
Slide14What is the Gene Ontology (GO)?A way to capture biological knowledge in a written and computable formDescribes attributes of gene products (RNA and protein)
Slide15http://www.geneontology.org
Reactome
E. Coli hub
Slide16The scope of GOWhat information might we want to capture about a gene product?What does the gene product do?
Where does it act?
How does it act?
Slide17Biological Processwhat does a gene product do?cell division
transcription
A commonly recognised series of events
Slide18Cellular Componentwhere is a gene product located?
plasma membrane
mitochondrion
mitochondrial membrane
mitochondrial matrix
mitochondrial lumen
ribosome
large ribosomal subunit
small ribosomal subunit
Slide19Molecular Functionhow does a gene product act?insulin bindinginsulin receptor activity
glucose-6-phosphate isomerase activity
Slide20Three separate ontologies or one large one? GO was originally three completely independent hierarchies, with no relationships between themAs of 2009, GO have started making relationships between biological process and molecular function in the live ontology
Slide21Function
Function
Process
a
rt of
s a
Slide22GO IS:
species independent
covers normal processes
GO is NOT:
NO pathological/disease processes
NO experimental conditions
NO
evolutionary relationships
NOT
a nomenclature system
Slide23Aims of the GO project Edit the ontologiesAnnotate gene products using ontology termsProvide a public resource of data and tools
Slide24Anatomy of a GO term
Unique identifier
Term name
Definition
Synonyms
Cross-references
Slide25node
node
node
edge
Ontology structure
Nodes = terms in the ontology
Edges = relationships between the concepts
GO is structured as a hierarchical directed acyclic graph (DAG
)
Terms
can have more than one parent and zero, one or more
children
node
Terms
are linked by
reationships
, which add to the meaning of the term
Less
specific
More
specific
Slide26Relationships between GO terms
is_a
part_of
regulates
positively regulates
negatively regulates
has_part
is_a
If
A
is a
B,
then
A
is a
subtype of
Bmitotic cell cycle is a cell cycle
lyase activity is a catalytic activity.
Transitive relationship: can infer up the graph
Slide28part_of
Necessarily part of
Wherever
B
exists, it is as
part of
A
. But not all
B
is part of
A
.
Transitive relationship (can infer up the graph)
B
A
Slide29regulates
One process directly affects another process or quality
Necessarily regulates: if both
A
and
B
are present,
B
always
regulates
A
, but
A may not always be regulated by B
B
A
Slide30Relationships are upside down compared to
is_a
and
part_of
Necessarily has part
has_part
GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011
Slide31is_a completeFor all terms in the ontology, you have to be able to reach the root through a complete path of is_a relationships:we call this being is_a completeimportant for reasoning over the ontology, and ontology development
Slide32True path rule
Child terms inherit the meaning of all their parent terms.
Slide33How is GO maintained?GO editors and annotators work with experts to remodel specific areas of the ontologySignalingKidney developmentTranscriptionPathogenesisCell cycleDeal with requests from the communitydatabase curators, researchers, software developersSome simple requests can be dealt with automaticallyGO Consortium meetings for large changesMailing lists, conference calls, content workshops
Slide34Requesting changes to the ontologyPublic Source Forge (SF) tracker for term related issueshttps://sourceforge.net/projects/geneontology/
Slide35Why modify the GO?GO reflects current knowledge of biologyInformation from new organisms can make existing terms and arrangements incorrectNot everything perfect from the outsetImproving definitionsAdding in synonyms and extra relationships
Slide36Searching for GO termshttp://www.ebi.ac.uk/QuickGO/http://amigo.geneontology.org… there are more browsers available on the GO Tools page:http://www.geneontology.org/GO.tools.browsers.shtmlThe latest OBO Gene Ontology file can be downloaded from:http://www.geneontology.org/ontology/gene_ontology.obo
Slide37Exercise
Browsing the Gene Ontology using
QuickGO
Exercise 1
15
mins
Slide38PART II: GO Annotation
Slide39A GO annotation is…A statement that a gene product:1. has a particular molecular function Or is involved in a particular biological process Or is located within a certain cellular component2. as determined by a particular evidence3. as described in a particular reference
Accession
Name
GO ID
GO term name
ReferenceEvidence CodeP00505
GOT2
GO:0004069
Aspartate
transaminase activity
PMID:2731362
IDA
Slide40Evidence codes
IDA: enzyme assay
IPI: e.g. Y2H
http://www.geneontology.org/GO.evidence.shtml
review papers
subcategories of ISS
BLASTs, orthology comparison, HMMs
Slide41GO evidence code decision tree
Slide42GOA makes annotations using two methodsElectronic Quick way of producing large numbers of annotationsAnnotations are less detailedManual Time-consuming process producing lower numbers of annotationsAnnotations are very detailed and accurate
Slide43Electronic annotation by GOA1. Mapping of external concepts to GO termsInterPro2GO (protein domains)SPKW2GO (UniProt/Swiss-Prot keywords)HAMAP2GO (Microbial protein annotation)EC2GO (Enzyme Commission numbers)SPSL2GO (Swiss-Prot subcellular locations)
Slide44Aspartate transaminase activity ; GO:0004069
lipid transport;
GO:0006869
Electronic annotation by GOA
Slide45Electronic annotation by GOA2. Automatic transfer of annotations to orthologs
Slide46Manual annotation by GOAHigh-quality, specific annotations using:Peer-reviewed papersA range of evidence codes to categorize the types of evidence found in a paperwww.ebi.ac.uk/GOA
Slide47Finding annotations in a paper
In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…
Process: response to wounding GO:0009611
wound response
serine/
threonine
kinase
activity
,
Function: protein serine/threonine kinase activity GO:0004674
integral membrane protein
Component: integral to plasma membrane GO:0005887
…for B. napus PERK1 protein
(Q9ARH1)
PubMed ID: 12374299
Slide48Qualifiers
Modify the interpretation of an annotation
NOT
(protein is not associated with the GO term)
colocalizes_with
(protein associates with complex but is not a bona fide member)
contributes_to
(describes action of a complex of proteins)
'With' column
Can include further information on the
method being referenced
e.g. the protein accession of an interacting protein
Additional information
Slide49The NOT qualifierNOT is used to make an explicit note that the gene product is not associated with the GO termAlso used to document conflicting claims in the literatureNOT can be used with ALL three gene ontologies
Slide50In these cells, SIPP1 was mainly present in the nucleus, where it displayed a non-uniform, speckled distribution and appeared to be excluded from the nucleoli.
excluded from the nucleoli
Slide51The colocalizes_with qualifierONLY used with GO component ontology
Gene products that are
transiently
associated
with an organelle or complex
Slide52The colocalizes_with qualifier
Example (from
Schizosaccharomyces
pombe
): Clp1 (Q9P7H1) relocalizes from the nucleolus to the spindle and site of cell division; i.e. it is associated transiently with the contractile ring (evidence from GFP fusion).
Slide53The contributes_to qualifierWhere an individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the whole complexcontributes_to is not needed to annotate a catalytic subunit. ONLY used with GO function ontology
Slide54.. To test whether the protein complex consisting of PIG-A, PIG-H, PIG-C and hGPI1 has
GlcNAc
transferase
activity
in vitro….…incubation of the radiolabeled donor of GlcNAc, UDP-[6-3H]GlcNAc, with lysates of JY5 cells transfected with GST-tagged PIG-A resulted in synthesis of GlcNAc-PI and its subsequent deacetylation to glucosa- minyl phosphatidylinositol (GlcN
-PI)
whether the protein complex
has
GlcNAc
transferase
activity
resulted in synthesis of GlcNAc-PI andIts subsequent deacetylation to
glucosa-minyl phosphatidylinositol (GlcN-PI)
Slide55WITH columnThe with column provides supporting evidence for ISS, IPI, IGI and IC evidence codesISS: the accession of the aligned protein/orthologIPI: the accession of the interacting proteinIGI: the accession of the interacting geneIC: The GO:ID for the inferred_from term
WITH column
Slide56How to access GO annotation data
Slide57Where can you find annotations?
UniProtKB
Ensembl
Entrez gene
Slide58Gene Association Downloads17 column files containing all information for each annotation
GO Consortium website
GOA website
Slide59GO browsers
Slide60GO Slims
Slide61GO slimsMany GO analysis tools use GO slims to give a broad overview of the datasetGO slims are cut-down versions of the GO and contain a subset of the terms in the whole GO GO slims usually contain less-specialised GO terms
Slide62Slimming the GO using the
‘
true path rule
’
Many gene products are associated with a large number of descriptive, leaf GO nodes:
Slide63Slimming the GO using the
‘
true path rule
’
…however annotations can be mapped up to a smaller set of parent GO terms:
Slide64GO slimsCustom slims are available for download;http://www.geneontology.org/GO.slims.shtmlOr you can make your own using;
QuickGO
http://www.ebi.ac.uk/QuickGO
AmiGO's
GO slimmerhttp://amigo.geneontology.org/cgi-bin/amigo/slimmer
Slide65Just some things to be aware of….The GO is continually changingNew terms createdExisting terms obsoletedRe-structuredNew annotations being created ALWAYS use a current version of ontology and annotationsIf publishing your analyses, please report the versions/dates you use:http://www.geneontology.org/GO.cite.shtmlDifferences in representation of GO terms may be due to biological phenomenon. But also may be due to annotation-bias or experimental assays
Often better to remove the ‘NOT’ annotations before doing any large-scale analysis, as they can skew the results
ontology
annotation
Slide66How scientists use the GO, and the tools they use for analysis
Slide67Source of annotationIf you wanted to find out the role of a gene product manually, you’d have to read an awful lot of papersBut by using GO annotations, this work has already been done for you!
GO:0006915 : apoptosis
Slide68How scientists use the GOFind out what a gene product does or which genes are involved in a certain biological process/functionAnalyse high-throughput genomic or proteomic datasets Validation of experimental techniquesGet a broad overview of a proteomeObtain functional information for novel gene products
Some examples…
Slide69time
control
Puparial adhesion
Molting cycle
Hemocyanin
Defense
response
Immune response
Response to stimulus
Toll regulated genes
JAK-STAT regulated genes
Immune response
Toll regulated genes
Amino acid catabolism
Lipid metobolism
Peptidase activity
Protein catabolism
Immune response
attacked
Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.
MicroArray data analysis
Validation of experimental techniques
(Cao
et al
., Journal of Proteome Research 2006)
Rat liver plasma membrane isolation
Slide71Analysis of high-throughput proteomic datasets
(Orrù
et al
., Molecular and Cellular Proteomics 2007)
Characterisation of proteins interacting with ribosomal protein S19
Slide72Obtain functional information for novel gene products
MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTS
SELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGD
FKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTG
ELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRN
FVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTIMPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHSRDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIPDSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLAEGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRERADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDSTRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM
InterProScan
Slide73Annotating novel sequencesCan use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequenceTwo tools currently available; AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgisearches the GO Consortium databaseBLAST2GO (from Babelomics
)
http://www.blast2go.org/
searches the NCBI database
Slide74AmiGO BLAST
Exportin
-T from
Pongo
abelii (Sumatran orangutan)
Slide75Numerous Third Party ToolsMany tools exist that use GO to find common biological functions from a list of genes:http://www.geneontology.org/GO.tools.microarray.shtml
Slide76GO tools: enrichment analysisMost of these tools work in a similar way:input a gene list and a subset of ‘interesting’ genestool shows which GO categories have most interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes tool provides a statistical measure to determine whether enrichment is significant
Slide77ExercisesSearching for GO annotations in QuickGOExercise 2: using GO termsExercise 3: using a protein IDUsing QuickGO to create a tailored set of annotationsExercise 4: FilteringExercise 5:
Statistics
Map-up annotation using a GO slim
Exercise 6
Slide78Thanks for listening