/
EBI Bioinformatics  Roadshow EBI Bioinformatics  Roadshow

EBI Bioinformatics Roadshow - PowerPoint Presentation

cappi
cappi . @cappi
Follow
342 views
Uploaded On 2022-06-28

EBI Bioinformatics Roadshow - PPT Presentation

13 th June 2012 Rotterdam Netherlands Duncan Legge Introduction to the Gene Ontology and GO Annotation Resources OUTLINE OF TUTORIAL PART I Ontologies and the Gene Ontology GO PART II GO Annotations ID: 926617

ontology gene terms protein gene ontology protein terms annotation part annotations biological activity relationships http geneontology tools www product

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "EBI Bioinformatics Roadshow" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

EBI Bioinformatics Roadshow13th June 2012Rotterdam, NetherlandsDuncan Legge

Introduction to the Gene Ontology and GO Annotation Resources

Slide2

OUTLINE OF TUTORIAL:PART I: Ontologies and the Gene Ontology (GO)PART II: GO Annotations How to access GO annotations How scientists use GO annotations

Slide3

PART I: Gene Ontology

Slide4

What does an ontology provide?

1. Consistent terminology – controlled vocabulary.

2. Relationships between terms – hierarchy.

Slide5

Controlled vocabulary

Q: What is a cell?

A: It really depends who you ask!

Slide6

Different things can be described by the same name

Slide7

Glucose synthesisGlucose biosynthesisGlucose formationGlucose anabolismGluconeogenesis

The same thing can be described by different names:

Slide8

Inconsistency in naming of biological conceptsSame name for different conceptsDifferent names for the same concept

Comparison is difficult – in particular across species or across databases

Just one reason why the Gene Ontology (GO) is is needed…

Slide9

Why do we need GO?

Large datasets need to be interpreted quickly

Inconsistency in naming of biological concepts

Increasing amounts of biological data available

Increasing amounts of biological data to come

Slide10

Increasing amounts of biological data availableSearch on mesoderm development…. you get 9441 results!

Expansion of sequence information

Slide11

What is an ontology?Dictionary: A branch of metaphysics concerned with the nature and relations of being (philosophy)A formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts (computer science)Barry Smith: The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality.

1606

1700s

Slide12

What is an ontology?More usefully: An ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.

is part of

Slide13

What’s in an Ontology?

Slide14

What is the Gene Ontology (GO)?A way to capture biological knowledge in a written and computable formDescribes attributes of gene products (RNA and protein)

Slide15

http://www.geneontology.org

Reactome

E. Coli hub

Slide16

The scope of GOWhat information might we want to capture about a gene product?What does the gene product do?

Where does it act?

How does it act?

Slide17

Biological Processwhat does a gene product do?cell division

transcription

A commonly recognised series of events

Slide18

Cellular Componentwhere is a gene product located?

plasma membrane

mitochondrion

mitochondrial membrane

mitochondrial matrix

mitochondrial lumen

ribosome

large ribosomal subunit

small ribosomal subunit

Slide19

Molecular Functionhow does a gene product act?insulin bindinginsulin receptor activity

glucose-6-phosphate isomerase activity

Slide20

Three separate ontologies or one large one? GO was originally three completely independent hierarchies, with no relationships between themAs of 2009, GO have started making relationships between biological process and molecular function in the live ontology

Slide21

Function

Function

Process

a

rt of

s a

Slide22

GO IS:

species independent

covers normal processes

GO is NOT:

NO pathological/disease processes

NO experimental conditions

NO

evolutionary relationships

NOT

a nomenclature system

Slide23

Aims of the GO project Edit the ontologiesAnnotate gene products using ontology termsProvide a public resource of data and tools

Slide24

Anatomy of a GO term

Unique identifier

Term name

Definition

Synonyms

Cross-references

Slide25

node

node

node

edge

Ontology structure

Nodes = terms in the ontology

Edges = relationships between the concepts

GO is structured as a hierarchical directed acyclic graph (DAG

)

Terms

can have more than one parent and zero, one or more

children

node

Terms

are linked by

reationships

, which add to the meaning of the term

Less

specific

More

specific

Slide26

Relationships between GO terms

is_a

part_of

regulates

positively regulates

negatively regulates

has_part

Slide27

is_a

If

A

is a

B,

then

A

is a

subtype of

Bmitotic cell cycle is a cell cycle

lyase activity is a catalytic activity.

Transitive relationship: can infer up the graph

Slide28

part_of

Necessarily part of

Wherever

B

exists, it is as

part of

A

. But not all

B

is part of

A

.

Transitive relationship (can infer up the graph)

B

A

Slide29

regulates

One process directly affects another process or quality

Necessarily regulates: if both

A

and

B

are present,

B

always

regulates

A

, but

A may not always be regulated by B

B

A

Slide30

Relationships are upside down compared to

is_a

and

part_of

Necessarily has part

has_part

GO and GO Annotation, EBI Bioinformatics Roadshow. Düsseldorf. March 2011

Slide31

is_a completeFor all terms in the ontology, you have to be able to reach the root through a complete path of is_a relationships:we call this being is_a completeimportant for reasoning over the ontology, and ontology development

Slide32

True path rule

Child terms inherit the meaning of all their parent terms.

Slide33

How is GO maintained?GO editors and annotators work with experts to remodel specific areas of the ontologySignalingKidney developmentTranscriptionPathogenesisCell cycleDeal with requests from the communitydatabase curators, researchers, software developersSome simple requests can be dealt with automaticallyGO Consortium meetings for large changesMailing lists, conference calls, content workshops

Slide34

Requesting changes to the ontologyPublic Source Forge (SF) tracker for term related issueshttps://sourceforge.net/projects/geneontology/

Slide35

Why modify the GO?GO reflects current knowledge of biologyInformation from new organisms can make existing terms and arrangements incorrectNot everything perfect from the outsetImproving definitionsAdding in synonyms and extra relationships

Slide36

Searching for GO termshttp://www.ebi.ac.uk/QuickGO/http://amigo.geneontology.org… there are more browsers available on the GO Tools page:http://www.geneontology.org/GO.tools.browsers.shtmlThe latest OBO Gene Ontology file can be downloaded from:http://www.geneontology.org/ontology/gene_ontology.obo

Slide37

Exercise

Browsing the Gene Ontology using

QuickGO

Exercise 1

15

mins

Slide38

PART II: GO Annotation

Slide39

A GO annotation is…A statement that a gene product:1. has a particular molecular function Or is involved in a particular biological process Or is located within a certain cellular component2. as determined by a particular evidence3. as described in a particular reference

Accession

Name

GO ID

GO term name

ReferenceEvidence CodeP00505

GOT2

GO:0004069

Aspartate

transaminase activity

PMID:2731362

IDA

Slide40

Evidence codes

IDA: enzyme assay

IPI: e.g. Y2H

http://www.geneontology.org/GO.evidence.shtml

review papers

subcategories of ISS

BLASTs, orthology comparison, HMMs

Slide41

GO evidence code decision tree

Slide42

GOA makes annotations using two methodsElectronic  Quick way of producing large numbers of annotationsAnnotations are less detailedManual  Time-consuming process producing lower numbers of annotationsAnnotations are very detailed and accurate

Slide43

Electronic annotation by GOA1. Mapping of external concepts to GO termsInterPro2GO (protein domains)SPKW2GO (UniProt/Swiss-Prot keywords)HAMAP2GO (Microbial protein annotation)EC2GO (Enzyme Commission numbers)SPSL2GO (Swiss-Prot subcellular locations)

Slide44

Aspartate transaminase activity ; GO:0004069

lipid transport;

GO:0006869

Electronic annotation by GOA

Slide45

Electronic annotation by GOA2. Automatic transfer of annotations to orthologs

Slide46

Manual annotation by GOAHigh-quality, specific annotations using:Peer-reviewed papersA range of evidence codes to categorize the types of evidence found in a paperwww.ebi.ac.uk/GOA

Slide47

Finding annotations in a paper

In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

Process: response to wounding GO:0009611

wound response

serine/

threonine

kinase

activity

,

Function: protein serine/threonine kinase activity GO:0004674

integral membrane protein

Component: integral to plasma membrane GO:0005887

…for B. napus PERK1 protein

(Q9ARH1)

PubMed ID: 12374299

Slide48

Qualifiers

Modify the interpretation of an annotation

NOT

(protein is not associated with the GO term)

colocalizes_with

(protein associates with complex but is not a bona fide member)

contributes_to

(describes action of a complex of proteins)

'With' column

Can include further information on the

method being referenced

e.g. the protein accession of an interacting protein

Additional information

Slide49

The NOT qualifierNOT is used to make an explicit note that the gene product is not associated with the GO termAlso used to document conflicting claims in the literatureNOT can be used with ALL three gene ontologies

Slide50

In these cells, SIPP1 was mainly present in the nucleus, where it displayed a non-uniform, speckled distribution and appeared to be excluded from the nucleoli.

excluded from the nucleoli

Slide51

The colocalizes_with qualifierONLY used with GO component ontology

Gene products that are

transiently

associated

with an organelle or complex

Slide52

The colocalizes_with qualifier

Example (from

Schizosaccharomyces

pombe

): Clp1 (Q9P7H1) relocalizes from the nucleolus to the spindle and site of cell division; i.e. it is associated transiently with the contractile ring (evidence from GFP fusion).

Slide53

The contributes_to qualifierWhere an individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the whole complexcontributes_to is not needed to annotate a catalytic subunit. ONLY used with GO function ontology

Slide54

.. To test whether the protein complex consisting of PIG-A, PIG-H, PIG-C and hGPI1 has

GlcNAc

transferase

activity

in vitro….…incubation of the radiolabeled donor of GlcNAc, UDP-[6-3H]GlcNAc, with lysates of JY5 cells transfected with GST-tagged PIG-A resulted in synthesis of GlcNAc-PI and its subsequent deacetylation to glucosa- minyl phosphatidylinositol (GlcN

-PI)

whether the protein complex

has

GlcNAc

transferase

activity

resulted in synthesis of GlcNAc-PI andIts subsequent deacetylation to

glucosa-minyl phosphatidylinositol (GlcN-PI)

Slide55

WITH columnThe with column provides supporting evidence for ISS, IPI, IGI and IC evidence codesISS: the accession of the aligned protein/orthologIPI: the accession of the interacting proteinIGI: the accession of the interacting geneIC: The GO:ID for the inferred_from term

WITH column

Slide56

How to access GO annotation data

Slide57

Where can you find annotations?

UniProtKB

Ensembl

Entrez gene

Slide58

Gene Association Downloads17 column files containing all information for each annotation

GO Consortium website

GOA website

Slide59

GO browsers

Slide60

GO Slims

Slide61

GO slimsMany GO analysis tools use GO slims to give a broad overview of the datasetGO slims are cut-down versions of the GO and contain a subset of the terms in the whole GO GO slims usually contain less-specialised GO terms

Slide62

Slimming the GO using the

true path rule

Many gene products are associated with a large number of descriptive, leaf GO nodes:

Slide63

Slimming the GO using the

true path rule

…however annotations can be mapped up to a smaller set of parent GO terms:

Slide64

GO slimsCustom slims are available for download;http://www.geneontology.org/GO.slims.shtmlOr you can make your own using;

QuickGO

http://www.ebi.ac.uk/QuickGO

AmiGO's

GO slimmerhttp://amigo.geneontology.org/cgi-bin/amigo/slimmer

Slide65

Just some things to be aware of….The GO is continually changingNew terms createdExisting terms obsoletedRe-structuredNew annotations being created ALWAYS use a current version of ontology and annotationsIf publishing your analyses, please report the versions/dates you use:http://www.geneontology.org/GO.cite.shtmlDifferences in representation of GO terms may be due to biological phenomenon. But also may be due to annotation-bias or experimental assays

Often better to remove the ‘NOT’ annotations before doing any large-scale analysis, as they can skew the results

ontology

annotation

Slide66

How scientists use the GO, and the tools they use for analysis

Slide67

Source of annotationIf you wanted to find out the role of a gene product manually, you’d have to read an awful lot of papersBut by using GO annotations, this work has already been done for you!

GO:0006915 : apoptosis

Slide68

How scientists use the GOFind out what a gene product does or which genes are involved in a certain biological process/functionAnalyse high-throughput genomic or proteomic datasets Validation of experimental techniquesGet a broad overview of a proteomeObtain functional information for novel gene products

Some examples…

Slide69

time

control

Puparial adhesion

Molting cycle

Hemocyanin

Defense

response

Immune response

Response to stimulus

Toll regulated genes

JAK-STAT regulated genes

Immune response

Toll regulated genes

Amino acid catabolism

Lipid metobolism

Peptidase activity

Protein catabolism

Immune response

attacked

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

MicroArray data analysis

Slide70

Validation of experimental techniques

(Cao

et al

., Journal of Proteome Research 2006)

Rat liver plasma membrane isolation

Slide71

Analysis of high-throughput proteomic datasets

(Orrù

et al

., Molecular and Cellular Proteomics 2007)

Characterisation of proteins interacting with ribosomal protein S19

Slide72

Obtain functional information for novel gene products

MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTS

SELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGD

FKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTG

ELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRN

FVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTIMPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHSRDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIPDSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLAEGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRERADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDSTRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM

InterProScan

Slide73

Annotating novel sequencesCan use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequenceTwo tools currently available; AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgisearches the GO Consortium databaseBLAST2GO (from Babelomics

)

http://www.blast2go.org/

searches the NCBI database

Slide74

AmiGO BLAST

Exportin

-T from

Pongo

abelii (Sumatran orangutan)

Slide75

Numerous Third Party ToolsMany tools exist that use GO to find common biological functions from a list of genes:http://www.geneontology.org/GO.tools.microarray.shtml

Slide76

GO tools: enrichment analysisMost of these tools work in a similar way:input a gene list and a subset of ‘interesting’ genestool shows which GO categories have most interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes tool provides a statistical measure to determine whether enrichment is significant

Slide77

ExercisesSearching for GO annotations in QuickGOExercise 2: using GO termsExercise 3: using a protein IDUsing QuickGO to create a tailored set of annotationsExercise 4: FilteringExercise 5:

Statistics

Map-up annotation using a GO slim

Exercise 6

Slide78

Thanks for listening