Daniel Standage Biology Department Indiana University Annota tion ˌa nə ˈ tā shən A critical or explanatory note or body of notes added to a text The act of annotating ID: 914457
Download Presentation The PPT/PDF document "Basics of Genome Annotation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Basics of Genome Annotation
Daniel Standage
Biology Department
Indiana University
Slide2An-no-ta-tion
\ˌa-nə-ˈtā-shən\
A critical or explanatory note or body of notes added to a textThe act of annotating
http://dictionary.reference.com/browse/annotation?s=t
Slide3Slide4Slide5Genome annotation
Slide6Genome annotation
Slide7Genome annotation
Information itself (e.g., this gene encodes a cytochrome P450 protein, with exons at…)
Annotation process (operational definition)Data managementformattingstoragedistributionrepresentation
Slide8Methods for gene finding
Ab
initio gene predictionGene prediction by spliced alignment
Slide9Ab initio
gene prediction
Ab initio: “from first principles”Requires only a genomic sequenceUses statistical model of genome composition to identify most probable location of start/stop codons, splice sitesPopular implementations
AugustusGeneMarkSNAP
Slide10Ab initio
gene prediction
Slide11Prediction by spliced alignment
Utilizes experimental (transcript) and/or homology (reference proteins) data
Spliced alignment of sequences reveals gene structurematches = exonsgaps = intronsPopular implementationsGeneSeqerExonerate
GenomeThreader
Slide12Comparison of prediction methods
Ab
initio
Spliced alignmentDo not require extrinsic evidence
Requires transcript and/or protein sequencesDoes not benefit from additional transcript data
Accuracy
improves with additional transcript data
More likely to recover
complete gene structures
More likely to recover
accurate internal exon/intron structure
Slide13Issues with gene prediction
Accuracy (best methods achieve ≈80% at exon level)
Parameters matter (species-specific codon usage)Comparison and assessment
Slide14Recurring theme in genomics
Once I have a result, how to I assess its reliability?
How do I compare it to alternative results?
Slide15Recurring theme in genomics
"Why, when you only had one result, did you think that was the correct one?"
Slide16Slide17Manual annotation
Visually inspect gene predictions, spliced alignments
Determine reliable consensus gene structureAvailable softwareApollo: http://apollo.berkeleybop.orgyrGATE
: http://goblinx.soic.indiana.edu/src/yrGATE
Slide18Slide19“Combiner” tools
Maker
: http://www.yandell-lab.org/software/maker.htmlEVidenceModeler: http://
evidencemodeler.sourceforge.net
Slide20Evaluating annotations
Comparison
ParsEval1: http://standage.github.io/AEGeAnQuality assessmentAnnotation Edit Distance2
(Maker)GAEVAL (PlantGDB)1
Standage and Brendel (2012) BMC Bioinformatics, 13:187.2Eilbeck et al (2009)
BMC Bioinformatics,
10
:67.
Slide21Recommendations / Considerations
Automated annotation
Manual refinementAssessment and filtering for particular analysesBe very skepticalRemember: no “one true” assembly / annotation
Slide22xGDBvm
Pre-installed on
iPlant cloud (free for academics!)Search for xGDBvm imageIncludes an EVM pipeline for automated annotationIncludes yrGATE for manual annotation
Visualization, search, access controlMore info: http://goblinx.soic.indiana.edu
Slide23xGDBvm demo
Polistes dominula
example