Lecture 3 Gene Finding and Sequence Annotation Objectives of this lecture Introduce you to basic concepts and approaches of gene finding Show you differences between gene prediction for prokaryotic and eukaryotic genomes ID: 917454
Download Presentation The PPT/PDF document "Gene Finding and Sequence Annotation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Gene Finding and Sequence Annotation
Lecture 3. Gene Finding and Sequence Annotation
Slide2Objectives of this lecture
Introduce you to basic concepts and approaches of gene finding
Show you differences between gene prediction for prokaryotic and eukaryotic genomes
Show you which sequence features can be used to identify genesIntroduce you gene finding methodsBriefly discuss the evaluation of gene finding methodsThis lecture will get you familiar with several important concepts of gene prediction, which will help you to recognize some important pitfalls and to make an informed choice for specific software applications.
Lecture 3. Gene Finding and Sequence Annotation
Slide3Gene Prediction: Computational Challenge
>Genomics DNA…….. atgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
Where is gene?
Slide4Lecture 3. Gene Finding and Sequence Annotation
Gene identification (or finding, or prediction, or annotation)
is about finding the location and structure of genes on (full) genomic DNA sequences.
This is generally a complicated process which can be facilitated by data obtained from Sequencing
,
gene expression
and
proteomics
experiments
because these provide a first source of information about the gene that are expressed and thus must be present on the genome.
Slide5Gene
prediction
E
xpression
data may
facilitate
gene
prediction
Lecture 3. Gene Finding and Sequence Annotation
Genomics, Transcriptomics, Proteomics and Metabolomics
Slide6Lecture 3. Gene Finding and Sequence Annotation
With the advent of next generation sequencing it has become fairly easy to generate full genome sequences. The real challenge is the annotation of these sequences (see next slide), i.e., providing a full description of the genome that lists all genes and other structures on the genome.
Why Gene Prediction/finding/searching?
Slide7Lecture 3. Gene Finding and Sequence Annotation
Genome (annotation) projects
According to National Center for Biotechnology Information (NCBI;
February 2012;
http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html)
Slide8Lecture 3. Gene Finding and Sequence Annotation
Look for ORF (Open Reading Frame)
(begins with start
codon, ends with stop codon, no internal stops!) long (usually > 60-100 aa) If homologous to “known” protein more likelyLook for basal signals
Transcription, splicing, translation
Look for regulatory signals
Depends on organism
Prokaryotes
vs
Eukaryotes
Vertebrate
vs
fungi
Protein Coding Genes in Genome!
Slide9Why and How Annotation?
This Increase in number of whole-genome sequences make it necessary
These are analyzed to identify protein-coding genes AND other genetic elements
Often some experimental data available to assist in this taskE.g., previously characterized genes, gene products, ESTsSequences of genes and products (from other organisms) can be aligned to identify translated regionsSet of genes from alignment only will be incompleteFeatures such as repeat and control sequences will be missingTherefore, computational methods have been developed to characterize genes and other features: ANNOTATION
Lecture 3. Gene Finding and Sequence Annotation
Slide10Prediction of genes & Genome annotation
Lecture 3. Gene Finding and Sequence Annotation
Use and development of computational approaches to accurately predict gene structure and annotate genomes
Ultimate goal: near 100% accuracy.
Reduce amount of experimental verification work.
Genome sequencing
Slide11Gene prediction in prokaryotic genomes is much simpler than for Eukaryotic genomes
Lecture 3. Gene Finding and Sequence Annotation
Genome
: 10Mbp-670Gbp
Genome
: 0.5-10Mbp
Human: 3Gbp
1%
protein
coding
>90%
protein
coding
Many repetitive sequences Few repetitive sequencesGene: exon structure Gene: single contiguous stretch
Slide12Lecture 3. Gene Finding and Sequence Annotation
There exist several classes of gene prediction methods:
>
methods are based on homology. Homology between protein or DNA sequences is defined in terms of shared ancestry. Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) or a duplication event (paralogs). In gene identification you can compare known DNA/mRNA sequences to a newly obtained genome sequence to obtain information about the location of a gene (and its structure) on the genome.>Other methods are ‘ab initio’. These methods don’t use existing experimental data (e.g., sequence data as in homology searching) but apply algorithms to identify gene signals in the DNA which may indicate the presence of a gene, or they determine the composition (gene content) of a piece of DNA, which may also give clues about the existence of a gene in a particular region of DNA.
Gene prediction methods
Slide13Categories of gene prediction programs
Lecture 3. Gene Finding and Sequence Annotation
Gene prediction methods
Ab
initio
Homology
Gene signals
start/stop codons
intron splice signals
transcription factor binding sites
ribosomal binding sites
poly-adenylation sites
Gene content
statistical description
of coding regions
difference between coding
and non-coding regions
translated DNA matches
known protein sequence
exons of genomic DNA
match a sequenced
cDNA
Intrinsic methods: without reference to known sequences
Extrinsic methods: with reference to known sequences
Slide14Protein-coding gene prediction in prokaryotes
Note
: we won’t look at the prediction of non-protein coding genes in this lecture
Lecture 3. Gene Finding and Sequence Annotation
The interaction of components of the transcription/translation machinery with the nucleotide sequence, and constraints imposed on protein-coding
nt
-sequences have resulted in distinct features that can be used to identify genes
Slide15Gene annotation in prokaryotes
Prokaryotes
stack multiple genes together for expression (“operons”)
Lecture 3. Gene Finding and Sequence AnnotationPromoterGene1
Gene2
Gene N
Terminator
Transcription
RNA Polymerase
mRNA 5’
3’
Translation
1
2
N
Polypeptides
N
C
N
C
N
C
1
2
3
Slide16Gene annotation in prokaryotesGene structure of prokaryotes
Coding region
Translation
start
Stop
ρ
-independent
t
ranscription
signal
Ribosomal
binding site
Transcription
start
Start codon
ATG
Stop codon
TAA, TAG, TGA
Identification of
sequence features helps identifying the gene
rho-independent transcription:
Causes the transcribed mRNA to
form a hairpin and terminate
transcription
Lecture 3. Gene Finding and Sequence Annotation
Slide17Lecture 3. Gene Finding and Sequence Annotation
Readings
,
For prokaryotes we can determine the open reading frame from the DNA sequence (and from the mRNA sequence). The ORF is the part of the sequence that codes for the protein. The ORF starts with an ATG (start codon) and ends with a end codon (see next slide). Every triplet of nucleotides (codon) is translated to its corresponding amino acid according to the genetic table (see next slide). In this example we observe a “ATG” in the middle of the sequence. This is not a start codon. It is even divided over two neighboring codons.
Slide18Gene annotation in prokaryotes
Genetic code: translation of codons to amino acids
Lecture 3. Gene Finding and Sequence Annotation
64 codons
Synonymous
codons
ATG>AUG – DNA>RNA
Slide19Gene Prediction: Computational Challenge
>Genomics DNA…….. atgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggct
atgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
Gene!
Slide20Microbial Gene Finding
Microbial genome tends to be gene rich (80%-90% of the sequence is coding)
The most reliable method – homology searches (e.g. using BLAST and/or FASTA)
Major problem – finding genes without known homologue.
Slide21Open Reading Frame
Open Reading Frame (ORF)
is a sequence of codons which starts with start codon, ends with an end
codon
and has no end
codons
in-between.
Searching for ORFs – consider all 6 possible reading frames: 3 forward and 3 reverse
Is the ORF a coding sequence?
Must be long enough (roughly 300
bp
or more)
Should have average amino-acid composition specific for a give organism.
Should have
codon use specific for the given organism.
Slide22Gene annotation in prokaryotes
Open Reading Frames (ORF): 6 reading frames
Lecture 3. Gene Finding and Sequence Annotation
ORF (open reading frame)
Start codon
Stop codon
Transcription
start
Frame 1
Frame 2
Frame 3
ATGACAGATTACAGATTACAGATTACAGGA
TAG
Next slide for detail
Slide23Gene annotation in prokaryotes
Lecture 3. Gene Finding and Sequence Annotation
Reading!!
Each sequence has 6 possible reading frames that potentially encodes a proteins in each direction (sense and anti-sense)For every piece of DNA/mRNA we can potentially define 6 reading frames (3 in the sense direction, 3 in the anti-sense direction). To identify the open reading frame (starting with an ATG and ending with an stop codon) we must in principle inspect each of these 6 reading frames. The ORF with the largest number of
codons
is often the correct one.
GACGTCTGCTTTGGAGAACTACATCAACCGGACTGTGGCTGTTATTACTTCTGATGGCAGAATGATTGTG
CTGCAGACGAAACCTCTTGATGTAGTTGGCCTGACACCGACAATAATGAAGACTACCGTCTTACTAACAC
GAC
GTC
TGC
TTT
GGA
GAA
CTACAT
CAACCGGACTGTGGCTGTTATTACTTCTGATGGCAGAATGATTGTGGACGTCTGCTTTGGAGAACTACATCAACCGG
ACTGTGGCT
GTT
ATT
ACT
TCT
GAT
GGC
AGA
ATG
ATT
GTG
GA
CGT
CTG
CTT
TGG
AGA
ACT
ACA
TCA
ACC
GGA
CTG
TGG
CTG
TTA
TTA
CTT
CTG
ATG
GCA
GAA
TGA
TTG
TG
CT
GCA
GACGAAACC
TCTTGATGTAGTTGG
CCTGACACCGACAATAATGAAGACTACCGTCTTACTAACACCTGCAGACG
AAACCTCTTGATGTA
GTTGGCCTGACA
CCGACAATAATGAAGACTACCGTCTTACTAACACCTGCAGACGAAACCTCTTGATGTAG
TTGGCCTGACACCGA
CAATAATGAAGACTACCGTCTTACTAACAC
Six Frames in a DNA Sequence looks like
stop
codons
– TAA, TAG, TGA
start
codons
-
ATG
Slide24Lecture 3. Gene Finding and Sequence Annotation
A
reading frame
refers to one of three possible ways of reading a nucleotide sequence.
Let's say we have a stretch of 15 DNA base pairs:
acttagccgggacta
You
can start translating
the DNA from the first letter, 'a,' which would be referred to as the first reading frame.
Or you can start reading from the second letter, 'c,' which is the second reading frame.
Or
you
can start reading from the third letter, 't,' which is the third reading frame.
The reading frame affects which protein is made. In the example below, the upper case letters represent amino acids that are coded by the three letters above and to the left of them.
The illustration above shows three reading frames. However, there are actually
six reading frames
: three on the positive strand, and three (which are read in the reverse direction) on the negative strand.
Reading frame
Problems
:
There will be many "ORFs“ occurring by chance
Some will be short - how do we know which are true?Introns make this useless in Eukaryotic DNA
Slide26Gene annotation in prokaryotes
Finding ORFs
Many more ORFs than genes
In E.Coli one finds 6500 ORFs while there are 4290 genes.In random DNA, one stop codon every 64/3=21 codons on average.Average protein is ~300 codons long. => search long ORFs.ProblemShort genesLecture 3. Gene Finding and Sequence Annotation
Genomic Sequence
Open reading frame
ATG
TGA
Slide27Gene annotation in prokaryotesBasic statistics (base statistics)
Codon
frequency can be used as a gene predication feature
Lecture 3. Gene Finding and Sequence Annotation
Figure from:
Zvelebil
M, Baum JO (2008) Chapter 10 Gene Detection and Genome Annotation in Understanding Bioinformatics, Garland Science, New York
clear difference
similar codon usage
Slide28Gene annotation in prokaryotes
Ribosomal binding site: Shine-
Delgarno
sequenceThe ribosome binding site for bacterial translation. In Escherichia coli, the ribosome binding site has the consensus sequence: 5′-AGGAGGU-3′ Location: between 3 and 10 nucleotides upstream of the initiation codon. Lecture 3. Gene Finding and Sequence Annotation
5’
3
’
AGGAGGU
AUG
3-10 nucleotides
Initiation codon
Ribosome binding site
Slide29Gene annotation in prokaryotesSequence homology
(mRNA-Protein)
Lecture 3. Gene Finding and Sequence Annotation
Uncharacterized
genome
(Blast) alignment
of
mRNA
(or protein) sequence
evidence for
presence of a gene
Readings!
Sequence homology is a powerful method to detect genes in a genome. However, it assumes that an mRNA sequence is present, which could have been obtained in other (
transcriptomics
) experiments.
An mRNA is an expressed gene. Thus, if we are able to align the mRNA to the genome, then we know the location of the gene. Since the mRNA does not contain
introns
while the gene on the DNA may contain
introns
, the alignment can even provide information about the
intron-exon
structure of the gene.
Note that if we have a protein sequence then we can first translated it back into a mRNA sequence and use this mRNA sequence in a homology search.
Slide30Alignment of ESTs against a genome
mRNA / EST sequences from
GenBank
(NCBI)Alignments of these sequences to the genome (UCSC)Lecture 3. Gene Finding and Sequence Annotation
DNA
Alignments
of mRNA/ESTs against genome
Intron in DNA (thus missing in mRNA). You will see a ‘gapped’ alignment.
EST
is a short sub-sequence of a
cDNA
sequence.
[1]
They may be used to identify gene
transcripts
, and are instrumental in gene discovery and gene sequence determination.
EST2Genome is one of the programs that aligns Expressed Sequence Tags (ESTs; small parts of mRNA sequences) to a genome sequence.
Slide31DNA
Assign orientation (
polyA
signal/tail, exon boundaries, annotation)
- strand
+ strand
Alignment of ESTs against a genome
Lecture 3. Gene Finding and Sequence Annotation
After alignment you must determine the correct strand on which the gene is located. Sometimes this is straightforward. If not, you can use information about polyA signal/tail, exon/intron structure or other annotation.
Slide32DNA
Determine overlap: 3 genes
- strand
+ strand
Alignment of ESTs against a genome
Lecture 3. Gene Finding and Sequence Annotation
If this is the case!
When there is an overlapping alignments are considered to belong to the same gene and can be grouped to obtain a more complete ‘model’ of the gene.
Slide33Gene annotation in prokaryotes
Algorithms for Gene Detection in prokaryotes
Some of the programs available
GeneMarkGeneMark.hmmGLIMMEREcoParseORPHEUS
Prodigal
Many programs for gene identification are available. You don’t have to memorize all these programs for the examination.
Lecture 3. Gene Finding and Sequence Annotation
Slide34Eukaryotic gene detection
Many principles of prokaryotic gene detection apply to eukaryotes
Similar base statistics
equivalent transcription, translation start/stop signalsHowever, much larger genome sizesRequire approaches with far lower rates of false positivesGene density is lessJunk DNA / repetitive sequencesCrucial difference: intronssplice sites do not have very strong signalsLecture 3. Gene Finding and Sequence Annotation
Slide35Gene annotation in
eukaryotes
Intron, exons and splice sites
E
xons in eukaryotes are more difficult to recognize
Smaller
Variable number
Final exon may not contain coding sequence
Exons are delimited by (variable) splice signals (and not by start/stop codons) as for prokaryotes
Lecture 3. Gene Finding and Sequence Annotation
Prokaryote
gene
length
length much smaller
than for prokaryotes
Large variation in exon (and intron) lengths in EukaryotesEukaryote
Eukaryote
Slide36Gene annotation in eukaryotes
GC - content
Lander (2001) Nature
higher GC content in genes
GC Vs. Gene density
more genes in GC rich
areas
Lecture 3. Gene Finding and Sequence Annotation
Explanation!
The percentage of GC in the genome is a rough indication for the presence of genes.
a). the percentage of GC for genes (red bars) is higher than for other parts of the genome (blue bars).
b). You can see that the percentage of GC correlates with gene density.
Thus
, GC gives a first indication but tells you nothing about the precise location of a gene nor its structure.
Slide37Gene annotation in eukaryotes
Complexity
EukaryotesFinding genes in Eukaryotes is difficult due to variation in gene structureAverage vertebrate gene is 30kb long out of which coding sequence is only about 1kbAverage coding region consists of 6 exons of about 150bpBUTDystrophin
: 2.4Mb long
Blood coagulation factor VIII
: 26 exons (69bp to 3106bp)
Intron 22 produces 2 transcripts unrelated to this gene.
Lecture 3. Gene Finding and Sequence Annotation
Gene finding algorithms are often capable of detecting an ‘average’ gene. However, genes that somehow deviate in length, structure, etc can be missed by gene finding programs.
Slide38Gene annotation in eukaryotes
Eukaryotic genome structure
Lecture 3. Gene Finding and Sequence Annotation
Gene A
Gene B
DNA
CpG island
(higher G+C content,
gene marker
Tandemly repeated DNA elements
Dispersed repeats (SINEs (e.g., Alu), LINEs)
Slide39Gene annotation in eukaryotes
Eukaryotic genome structure
Lecture 3. Gene Finding and Sequence Annotation
DNA
Gene A
Gene B
Regulatory sequences (e.g., enhancers)
Exon
Intron
DNA
pre-mRNA
Transcription
RNA polymerase II
Promoter elements
transcription start site
transcription end site
Slide40Gene annotation in eukaryotes
Eukaryotic genome structure
Lecture 3. Gene Finding and Sequence Annotation
mRNA
pre-mRNA
AAAAAAAAAAAAAAAAAAAA
Splicing
Translation of codons
protein
coding sequence
5' UTR
3' UTR
Slide41Exon
Exon
Intron
Intron
Splice
Sites
Acceptor:
CAG/
G
Gene annotation in eukaryotes
Exon
– Intron structure
Donor:
(C,A)AG
/GT(A,G)AGT
Branch point signal :
CT(G,A)A(C,T)
(10-50bp upstream from acceptor)
Lecture 3. Gene Finding and Sequence Annotation
Readings!
The boundaries between exons and introns are characterized by certain sequence features.
An exon will start with a G end with an AG -------An intron will start with a GT and will end with a CAG
The full sequence feature of the exon/intron boundary is (C,A)AG/GT(A,G)AGT. This means that the last 3 nucleotides of an exon are CAG or AAG and the the first 6 nucleotides of the intron are GTAAGT or GTGAGT.
Note
that these are all very short sequences which may also occur by chance in a DNA sequence and which may mislead gene finding programs.
Slide42Eukaryotic mRNAs are
polyadenylated
, i.e., have up to 250 A’s added to their 3’ end after transcription terminates (T) Signals:
Gene annotation in eukaryotes
Polyadenylation
signal
Lecture 3. Gene Finding and Sequence Annotation
The polyA signal is another example of a signal (sequence feature) that signals the end of transcription.
For Detail: http://themedicalbiochemistrypage.org/rna.php#processing
Slide43Gene annotation in eukaryotes
Anatomy of a Eukaryotic Gene
Lecture 3. Gene Finding and Sequence Annotation
TATA Box
CAAT
Box
http://en.wikipedia.org/wiki/CAAT_box
Cis
-regulatory Elements
may be located thousands of bases away;
Regulatory TFs
bind.
Pol II, Basal TFs
bind
The structure of a human gene. It is the task of gene finding algorithms to elucidate this structure.
Slide44Gene annotation in eukaryotes
Promotor
sequences and binding sites for transcription factors
Further differences between prokaryotic and eukaryotic gene structures:Sequence signals in upstream regions are much more variable in eukaryotes Both in position and compositionsControl of gene expression is more complex in eukaryotesCan be affected by many molecules binding the DNA in the gene regionThis leads to many more potential
promotor
binding sites
These binding sites may be spread over a much larger region (several thousand bases)
Strict control of gene expression
Some genes are known to be poorly expressed because high levels would be damaging (e.g., genes for growth factors)
Such genes sometimes lack the TATA box characteristic for
promotors
.
This complicates the identification of such genes
Lecture 3. Gene Finding and Sequence Annotation
Slide45Methods to detect eukaryotic gene signals
Promotors
Transcription start/stop signals
e.g. TATA box (30% of genes don’t have TATA box)e.g. polyA signalTranslation start/stop signalsno defined ribosome-binding site in eukaryotic genesLecture 3. Gene Finding and Sequence Annotation
Slide46Methods to predict the intron/exon structure
ORF identification
methods for prokaryotes
don’t workIf exons are long enough then base statistics can be used.Signals for splice sites are not well definedInitial/terminal exons also contain non-coding sequenceLecture 3. Gene Finding and Sequence Annotation
Slide47Complete Eukaryotic gene models
Programs that use and combine all features of a gene to make a prediction about the complete gene structure (=model)
E.g.,
GenScanLecture 3. Gene Finding and Sequence Annotation
Slide48Beyond gene prediction
Functional annotation.
determine the function of a predicted gene
Genome comparisonuse other organisms to refine gene modelUse of experimental data to evaluate gene modele.g. gene expressionLecture 3. Gene Finding and Sequence Annotation
Slide49Lecture 3. Gene Finding and Sequence Annotation
Gene identification programs based on comparison with related genome sequences:
TWAIN
TWINSCAN Ab initio gene identification programs including those which use homologous gene sequences: GAZE The GeneMark set of programs Genie GenomeScan GenScan GLIMMER, GlimmerM and GlimmerHMM
GrailEXP
ORPHEUS
Wise2 including
GeneWise
Lecture 3. Gene Finding and Sequence Annotation
Identifying
tRNA
genes: tRNAscan-SE program and web server Promoter prediction programs: CorePromoterExon prediction programs: FirstEF JTEF MZEF Splice site prediction programs:
GeneSplicer
SplicePredictor
Genome annotation visualization programs:
Apollo
Artemis and Artemis Comparison Tool (ACT)
VISTA
Slide51Lecture 3. Gene Finding and Sequence Annotation
Web Servers:
The following web sites provide on-line access to gene annotation tools:
Analysis and annotation tool (AAT) FirstEF FGENES family of programs FunSiteP GAP2, NAP and other DNA alignment programs GeneBuilder
GeneSplicer
GeneWalker
GeneWise
is part of the
Wise2 suite
GenScan
GrailEXP HMMGene McPromoter
NetPlantGene NNPP ProScan