/
Genome Annotation Genome Annotation

Genome Annotation - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
503 views
Uploaded On 2016-04-10

Genome Annotation - PPT Presentation

BCB 660 October 20 2011 From Carson Holt Annotations Automated Ab initio based on genomic sequence alone Involves comparisons to known proteins BLAST similarity Sequence motifs such as startstop ID: 277878

genome gene evidence annotation gene genome annotation evidence based sequence proteins quality maker predictors genes file ests prediction initio

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Genome Annotation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Genome Annotation

BCB 660

October 20, 2011Slide2

From Carson HoltSlide3

Annotations

Automated

Ab

initio

(based on genomic sequence alone)

Involves comparisons to known proteins (BLAST similarity)

Sequence motifs such as start/stop

codons

,

intron/exon

boundaries

Evidence-based (

ESTs

)

Involves alignment of experimental EST (

cDNA

) data to a gene prediction

Manual

Manual

curation

of genes predicted automatically

Check gene structure, presence of conserved domains, match of

ESTs

to gene prediction

Align to related genes/proteins and look for oddities (missing

exons

, early stop

codons

, etc).

Annotation can then be manually edited

May also involve assigning function (based on sequence similarity, conserved domains) via Gene Ontology

Structural:

exons

,

introns

,

UTRs

, splice forms etc.

Functional: process a gene is involved in (metabolism), molecular function (

hydrolase

), location of expression (expressed in the mitochondria), etc.Slide4

Classic strategy

Combine

ab

initio

and evidence-based gene predictors together to come up with a

concensus

predicted gene set

Ask community to pitch in and manually annotate as many genes as possible

Leads to great variability in quality of different genome annotations, often many versions of official gene setsSlide5

NGS and the future of genome annotation

In 2010, 1300 eukaryotic genome projects were underway

-- assuming 10,000 genes per genome, that’s 13,000,000 new annotations will be needed

-- quality control and maintenance become an issue

Some organizations dedicated to genome annotation (

i.e

ENSEMBL and

VectorBase

) but 1300 genomes will not be feasible

Need for high quality, automated annotation pipelines, that are easy to use by small research groups without extensive bioinformatics expertiseSlide6

MAKER Pipeline:

Especially effective for Emerging Eukaryote Model Organisms

Incorporates

ab

initio

and evidence-based gene predictors

Gene predictions are run a first time

Then a small subset of the genome assembly is used to train gene predictors (building genome-specific

HMMs

)

Then trained gene predictors are run again on whole genome

** Really nice if you don’t have a basis to start from (e.g.

de novo

gene prediction)Slide7

What does MAKER do?

* Identifies and masks out repeat elements

* Aligns

ESTs

to the genome

* Aligns proteins to the genome

* Produces

ab

initio gene predictions

* Synthesizes these data into final annotations

* Produces evidence-based quality values for downstream annotation management Slide8

MAKER Steps involved

1. Compute phase

RepeatMasker

BLAST

Exonerate

SNAP (and other gene predictors)

2. Filter/cluster phase

Identify/remove marginal predictions and alignments based on quality scores/cutoffs, etc

Cluster to identify overlapping alignments/predictions– to remove redundancy and assess weight of evidence

3. Polish

Realigns BLAST hits to obtain greater precision at

exon

boundaries (Exonerate)

4. Synthesis

Collect evidence for each annotation, using EST evidence

Evidences scores plus sequences (genomic, EST, coding,

intron) passed to SNAPSNAP then uses this evidence to retrain and alter its internal HMM5. AnnotatePost-processing of SNAP prediction, recombine with evidence to generate complete annotationsOutput is a gff3 annotation that can be imported into genome browsersSlide9

Inputs to MAKER

Genomic sequence

Config

files

External executables

Sequence database locations

Compute parameters

Sequence database files (choice of these turns out to be extremely important)

Transposons

file (default plus known organism-specific)

Repeatmasker

database file (organism-specific,

optionsal

)

Proteins file (known proteins from related organisms you want to align to the genome)

ESTs

/mRNAs file (the evidence)Slide10

MAKER Output (Apollo browser)