Drosophila 012019 Wilson Leung Outline Transcription start sites TSS annotation goals Promoter architecture in D melanogaster Find the initial transcribed exon in the target species Annotate putative transcription start sites ID: 749410
Download Presentation The PPT/PDF document "Searching for Transcription Start Sites ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Searching for Transcription Start Sites in Drosophila
09/2020
Wilson LeungSlide2
Outline
Transcription start sites (TSS) annotation goals
Promoter architecture in D. melanogasterDefine TSS positions and TSS search regions in D. ananassaeSlide3
GEP Annotation Projects
GEP Publications
Motif Project
Informant Species
Expanded F Project
D. busckii
D. hydei
D. navojoa
D. arizonae
D. obscura
D. serrata
D. suzukii
D. grimshawi
D. virilis
D. mojavensis
D. willistoni
D. miranda
D. pseudoobscura
D. persimilis
D. bipectinata
D. ananassae
D. kikkawai
D. ficusphila
D. rhopaloa
D. elegans
D. takahashii
D. biarmipes
D. eugracilis
D. yakuba
D. erecta
D. melanogaster
D. simulans
D. sechellia
Tree scale: 0.1
The
Pathways Project
annotates genes from
27
Drosophila
species
F Element Projects
The
Parasitoid Wasp Project
annotates genes from 4 wasp speciesSlide4
Research goals for the
Drosophila
Muller F element motif project
Identify
conserved, regulatory motifs
that enable F element genes to be expressed in a heterochromatic environment
Riddle NC,
et al.
PLoS Genet. 2012 Sep;8(9):e1002954.Slide5
D. melanogaster
D. simulans
D. sechellia
D. yakuba
D. erecta
D. ficusphila
D. eugracilis
D. biarmipes
D. takahashii
D. elegans
D. rhopaloa
D. kikkawai
D. ananassae
D. bipectinata
Use comparative genomics to identify
factors that enable gene expression
on the expanded F element
4
HP1a
chr4
Pericentric heterochromatin
D. melanogaster
4R
HP1a
4L
D. ananassaeSlide6
D. ananassae
and
D. melanogaster
F element genes show
similar range of expression levels
Chen Z-X,
et al.
Genome Res. 24:1209-1223
D. ananassae
Adult Females
CAI (Codon Bias)
Expression Levels (rlog)
LOESS Regression Line
Adult Males
Adult Females
D. mel
: F (modENCODE)
D. ana
: F (modENCODE)Slide7
Transcription factor binding sites (motifs) are
short
and degenerate
1
2
3
4
5
6
7
8
9
10
A46
1203
1135
79
C
22236
64070
054
1333G
911
30314
027
1110
T3631
23454
056
54019
1
2
3
4
567
8910
http://jaspar.genereg.net/matrix/MA0205.1/
Bits
1.5
1.0
0.5
0.0Slide8
Futility theorem: difficult to find
functional motif instances
in the entire genome
Proposed by Wasserman WW and
Sandelin
A
Nat Rev Genet. 2004 Apr;5(4):276-87
Most motif instances in the genome are false positivesMultiple testing correction reduce power to detect biologically interesting motifs as the search space increase
FIMO Tutorial:
http://meme-suite.org/doc/fimo-tutorial.html Slide9
TSS of F element genes show lower levels of H3K9me3 and HP1a
H3K9me2
H3K9me3
H3K36me3
HP1a
POF
Su(
var
)3-9
PolII
Riddle NC,
et al.
PLoS Genet. 2012 Sep;8(9):e1002954.
H3K36me3
HP1a
POF
Su(
var
)3-9
PolIISlide10
Three strategies for motif finding
Multiple genes in a single species
Genes with common expression patternSequences associated with ChIP-Seq peaksSingle gene in multiple speciesPhylogenetic footprinting
Multiple genes in multiple species
Compare multiple sequence
alignment profiles
of multiple genes (
Magma)Slide11
Motif finding using multiple genes within a single species
Sequences surrounding TSS
Predicted motif instances
1
2
3
4
5
6
7
8
9
10
A
4
6
12
0
3
1
13
579C22
236640700541333
G91130314027
1110T3631234540
5654019
Bits
0.01.0
2.0510
Trl: FlyReg_DNaseISlide12
Genes
PhyloP
phastCons
Conserved Elements
Multiple Sequence Alignment
D. mel
: chr4
Motif finding using single gene in multiple speciesSlide13
Motif finding using multiple genes in multiple species (
PhyloNet
)
Based on Figure 1 from Wang T and Stormo GD. PNAS 2005 Nov 29;102(48):17400-5.
Promoter sequences
Conserved motifs
1. Identify conserved regions (profiles) in whole genome multiple sequence alignments
2. Identify multiple genes in the genome with similar alignment profilesSlide14
Magma: Multiple Aligner of Genomic Multiple Alignments
Key features of
Magma:Runs ~70x faster
than
PhyloNet
Analyze multiple sequence alignments with
gaps
Use set-covering approach to minimize redundancy in discovered motifsIhuegbu NE, Stormo GD, Buhler J. J Comput Biol. 2012 Feb;19(2):139-47.
Computationally tractable to analyze conserved motifs in multiple eukaryotic genomesSlide15
Drosophila
species used to identify conserved regulatory motifs on the F element
D. melanogaster
D. simulans
D. sechellia
D. yakuba
D. erecta
D. ficusphila
D. eugracilis
D. biarmipes
D. takahashii
D. elegans
D. rhopaloa
D. kikkawai
D. ananassae
D. bipectinata
D. pseudoobscura
D. persimilis
D. willistoni
D. mojavensis
D. virilis
D. grimshawi
Species sequenced by modENCODE
Species
Substitutions per neutral
site
D. ficusphila
0.80
D. eugracilis
0.76
D. biarmipes
0.70
D. takahashii
0.65
D. elegans
0.72
D. rhopaloa
0.66
D. kikkawai
0.89
D. bipectinata
0.99
Data from Table 1 of the modENCODE comparative genomics white paper
F element motif projectSlide16
TSS annotation goals for the
F element project
Research goal:Identify
motifs
that enable Muller F element genes to function within a heterochromatic environment
Annotation goals:
Define
search regions enriched in regulatory motifsDefine precise location of TSS if possibleDefine search regions where TSS could be foundDocument the evidence used to support the TSS annotationsSlide17
Strategies for utilizing the
GEP TSS annotations
Use TSS annotations to
anchor
the whole-genome multiple sequence alignments
Multiple sequence alignment is hard (
NP-hard
)Example: anchored multiple alignment with DIALIGNAlign orthologous promoters to identify conserved regulatory motifs:Examples: EvoPrinterHD, Pro-Coffee
De novo and discriminatory motif discoveryAnalyze TSS search regions to identify over-represented motifsExamples: MEME Suite, HOMERSlide18
Challenges with TSS annotations
Fewer constraints on untranslated regions (UTRs)
UTRs evolve more quickly
than coding regions
Open reading frames, compatible phases of donor and acceptor sites do not apply to UTRs
Most genes on the expanded F element have
large introns
Low percent identity (~50-70%) between the target genome and the D. melanogaster UTRsMost ab initio gene finders do not predict UTRsCannot use RNA-Seq data to precisely define the TSSSlide19
TSS annotation workflow
Identify the ortholog
Note the gene structure in D. melanogaster
Annotate the coding exons
Classify the type of core promoter in
D. melanogaster
Annotate the initial transcribed exon
Define TSS positions or TSS search regionsSlide20
RNA Polymerase II core promoter
Initiator motif (
Inr) contains the TSSTFIID binds to the TATA box and Inr to initiate the assembly of the pre-initiation complex (PIC)
Juven
-Gershon T and
Kadonaga
JT. Dev Biol. 2010 Mar 15;339(2):225-9.Slide21
Core promoter motifs can affect gene expression levels
Juven
-Gershon T,
et al
.
Nat Methods. 2006 Nov;3(11):917-22.
SCP1:Slide22
High turnover of core promoter motifs between
D. erecta
and
D. ananassae
Rach EA
et al.
Genome Biol. 2009;10(7):R73.Slide23
Peaked versus broad promoters
Kadonaga
JT. Wiley
Interdiscip
Rev Dev Biol. 2012 Jan-Feb;1(1):40-51.
Peaked promoter
(Single strong TSS)
Broad promoter
(Multiple weak TSS)
50-300 bpSlide24
RNA-Seq biases introduced by library construction
cDNA fragmentation
Strong bias at the 3’ end
RNA fragmentation
More uniform coverage
Miss the 5’ and 3’ ends of the transcript
Wang Z,
et al. Nat Rev Genet. 2009 Jan;10(1):57-63.Gene Span
RNA-Seq Read Count
5’3’Slide25
Techniques for finding TSS
Identify the 5’ cap at the beginning of the mRNA
Cap Analysis of Gene Expression (CAGE)
RNA Ligase Mediated Rapid Amplification of cDNA Ends (
RLM-RACE
)
Cap-trapped Expressed Sequence Tags (5’ ESTs)
More information on these techniques:Takahashi H, et al. CAGE (cap analysis of gene expression): a protocol for the detection of promoter and transcriptional networks. Methods Mol Biol. 2012 786:181-200.Sandelin A, et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet. 2007 Jun;8(6):424-36.Slide26
Promoter architecture in Drosophila
Classify core promoter based on the
Shape Index (SI)Determined by the distribution of CAGE and 5’ RLM-RACE reads
Shape index is a
continuum
Most promoters in
D. melanogaster
contain multiple TSSMedian width = 162 bp~70% of vertebrate genes have broad promotersHoskins RA, et al. Genome Res. 2011 Feb;21(2):182-92.Slide27
Genes with peaked promoters show stronger spatial and tissue specificity
46% of genes with broad promoters are expressed in all stages of embryonic development
19% of genes with peaked promoters are expressed in all stages
Hoskins RA,
et al.
Genome Res. 2011 Feb;21(2):182-92.Slide28
Resources for classifying the type of core promoter in
D. melanogaster
Only a subset of the modENCODE data are available through FlyBase D. melanogaster
GEP UCSC Genome Browser [
Aug. 2014 (BDGP Release 6) assembly
]
FlyBase gene annotations (release 6.34)
modENCODE TSS (Celniker) annotationsDNase I hypersensitive sites (DHS)CAGE and RAMPAGE datasets9-state and 16-state chromatin modelsTranscription factor binding site (TFBS) HOT spotsSlide29
9-state chromatin model
Kharchenko
PV,
et al.
Nature. 2011 Mar 24;471(7339):480-5.Slide30
DNaseI Hypersensitive Sites (DHS) correspond to accessible regions
Aasland
R and Stewart AF. Methods Mol Biol. 1999;119:355-62.
Ho JW,
et al.
Nature. 2014 Aug 28; 512(7515):449-52.Slide31
High
turnover
of DNase I Hypersensitive Sites (DHS) between human and mouse
Vierstra
J
et al.
Science. 2014 Nov 21;346(6212):1007-12.Slide32
Three common types of core promoters
Haberle V and Stark A. Nat Rev
Mol
Cell Biol. 2018 Oct;19(10):621-637.Slide33
modENCODE TSS annotations
Two sets of modENCODE TSS predictionsTSS (Celniker)
Most recent dataset produced by modENCODEAvailable on the GEP UCSC Genome Browser
TSS (Embryonic)
Older dataset available from FlyBase
GBrowse
Use
TSS (Celniker) dataset as the primary evidenceHoskins RA, et al. Genome Res. 2011 Feb;21(2):182-92Slide34
Classify the D. melanogaster
core promoter based on (TSS) Celniker annotations and DHS positions
Consider DHS positions within a 300bp window surrounding the start of the D. melanogaster
transcript
TSS classification
# Annotated
TSS
# DHS
positionsPeaked101
011Intermediate≤ 1> 1> 1≤ 1
Broad> 1> 1Insufficient evidence00Slide35
DEMO
: Classify the core promoter of D. melanogaster Rad23Slide36
BG3 9-state
S2 9-state
chr4
CG2316-RB
CG2316-RD
CG2316-RC
CG2316-RA
CG2316-RG
CG2316-RH
BG3 DHS
S2 DHS
Kc DHS
Additional DHS data from different stages of embryonic development DHS data produced by the BDTNP projectEvidence tracks:Detected DHS Positions (Embryos)DHS Read Density (Embryos)
Thomas S, et al. Genome Biol. 2011;12(5):R43.
Stage 11
Stage 14TSS (Celniker)
Stage 5
Stage 9
Stage 10Slide37
Standardize analysis of MachiBase and modENCODE CAGE data using
CAGEr
Bioconductor package developed by RIKENMap datasets against
release 6
assembly
37 modENCODE CAGE samples; 7 MachiBase samples
Define
TSS and promoters for each sampleDefine consensus promoters across all samplesHaberle V, et al. Nucleic Acids Res. 2015 Apr 30;43(8):e51.Slide38
TSS classifications based on
CAGEr
Peaked
FlyBase Genes
modENCODE CAGE Peaks
modENCODE CAGE (Plus)
Intermediate
FlyBase Genes
modENCODE CAGE Peaks
modENCODE CAGE (Plus)
Broad
modENCODE CAGE Peaks
modENCODE CAGE (Minus)
FlyBase GenesSlide39
Benefits of RAMPAGE
RAMPAGE =
RNA
A
nnotation and
M
apping of
Promoters for Analysis of Gene ExpressionCAGE only allows sequencing of short sequence tags (~27 bp) near the 5’ capAmbiguous read mapping to large parts of the genomeRAMPAGE produces long
paired-end reads instead of short sequence tagsDeveloped novel algorithm to identify TSS clustersUsed paired-end information during peak callingUsed Cufflinks to produce partial transcript modelsBatut P and Gingeras TR. Curr
Protoc Mol Biol. 2013 Nov 11;104:Unit 25B.11.Slide40
RAMPAGE results on the GEP
UCSC Genome Browser
Lifted RAMPAGE results from release 5 to release 6Results from
36 developmental stages
Combined TSS peak call from all samples
Available under the “
Expression and Regulation
” sectionSlide41
Changes in the dominant TSS of
Rad23
across different developmental stages
Stages of Development
Adult females
Rad23
CAGE Tag ClustersSlide42
New
Drosophila
RAMPAGE datasets
D. melanogaster
RAMPAGE data from second biological replicate
Lifted results from release 5 to the release 6 assembly
“
Combined RAMPAGE TSS (Replicate 2) (R5)” trackRAMPAGE data from three other Drosophila speciesD. erecta, D. ananassae, and D. pseudoobscuraSamples collected at 1 hour intervals throughout embryonic development (1-hour to 23-hour embryos)“
Combined RAMPAGE TSS” trackBatut PJ and Gingeras TR. Elife. 2017 Dec 20;6.Slide43
Evidence for TSS annotations
(in general order of importance)
Experimental dataRAMPAGE
ATAC-Seq
RNA-Seq
Conservation
Type of TSS (peaked/intermediate/broad)
in D. melanogasterSequence similarity to initial exon in D. melanogasterSequence similarity to other Drosophila species (ROAST)Core promoter motifsInr, TATA box, etc.Slide44
Evidence tracks for TSS annotations in the
D. ananassae GEP UCSC Genome Browser
Combined
RAMPAGE
TSS
Eye discs
ATAC-Seq
Histone modifications ChIP-Seq (H3K4me2)TSS predictions from Gnomon, N-SCAN PASA-EST, and AugustusRNA-Seq data from multiple stages and tissuesCombined splice junction predictionsSlide45
TSRchitect
analysis results for the Drosophila RAMPAGE datasets
D. melanogaster
RAMPAGE evidence tracks:
TSRchitect Combined RAMPAGE TSS (Replicate 1)
TSRchitect Combined RAMPAGE TSS (Replicate 2)
RAMPAGE data from three other Drosophila speciesD. erecta, D. ananassae, and D. pseudoobscuraSamples collected at 1-hour intervals throughout embryonic development (1-hour to 23-hour embryos)
“Combined RAMPAGE TSS” trackBatut PJ and Gingeras TR. Elife. 2017 Dec 20;6.Slide46
Calculate promoter Shape Index (SI) with
TSRchitect
https://bioconductor.org/packages/release/bioc/html/TSRchitect.html
ATAC-Seq
Adult Females
RNA-Seq
D.
mel
Proteins
GeMoMa
Combined RAMPAGE
PeakedSlide47
ATAC-Seq:
A
ssay for Transposase-
A
ccessible
C
hromatin with high-throughput
SEQuencing Buenrostro JD, et al. Curr Protoc Mol Biol. 2015 Jan 5;109:21.29.1-21.29.9.Slide48
ATAC-Seq
data for
D. ananassae and 10 other Drosophila species
Jacobs J
et al.
Nat Genet. 2018 Jul;50(7):1011-1020.
Rad23
RepeatMasker
Adult Females
Adult Males
Embryos
RNA-Seq
regtools
ATAC-Seq
D.
mel
Proteins
GeMoMaSlide49
Determine the gene structure in
D. melanogaster
FlyBase:
GBrowse
UTR
CDS
Gene Record Finder
:
Transcript DetailsSlide50
Identify the initial transcribed exon using NCBI blastn
Retrieve the sequences of the initial exons from the
Transcript Details
tab of the
Gene Record Finder
Use placement of the flanking exons to reduce the size of the search region if possible
Increase sensitivity of nucleotide searches
Change Program Selection to blastnChange Word size to 7Change Match/Mismatch Scores to +1, -1Change Gap Costs to Existence: 2, Extension: 1Slide51
Optimize alignment parameters based on expected levels of conservation
Derive alignment scores using information theory
Relative entropy of target and background frequenciesMatch +2, Mismatch -3 optimized for
90% identity
Match +1, Mismatch -1 optimized for
75% identity
States DJ,
et al. Methods. 1991 3:66-70.Slide52
Perform
blastn
search of Rad23:1 against the
D. ananassae
contig
D. melanogaster
exon Rad23:1
88000
90000
Limit the search region via Subject subrange
D. ananassae
contig20Slide53
Extrapolate TSS position based on
blastn
alignment of the initial transcribed exon
Assume the length of the initial transcribed exon is conserved between
D. melanogaster
and
D. ananassae
blastn
: D. mel: Rad23:1 (Query) vs. contig20 (Sbjct)
Query start: 128
Extrapolate TSS position:
89047-127 =
88,920
Start codonSlide54
TSS annotation for Rad23
in D. ananassae
TSS position:
89,003
Position with highest combined RAMPAGE read density
Narrow TSS search region:
88,981-89,060
Peaked promoter with an SI value of -0.82Wide TSS search region: 88,776-89,158MACS2 peak from ATAC-Seq of eye-antennal discsEncapsulated the TSS position inferred from blastn search against Rad23:1Slide55
TSS annotation summary
Most of the
D. melanogaster core promoters have multiple TSSClassify the type of promoter (peaked/intermediate/broad) based on the transcriptome evidence from D. melanogaster
Use multiple lines of evidence to infer the TSS region in
D. ananassae
RAMPAGE
ATAC-Seq
H3K4me2 ChIP-SeqRNA-Seq coverageblastn (change search parameters)Slide56
Questions?
PhyloP scores:
Under negative selection
Fast-evolving
PhyloP
Conserved Elements
phastConsSlide57Slide58
Structure of a typical mRNA
Pesole G.
et al. Genome Biology. 2002: 3(3) reviews0004.1-reviews0004.10.Slide59
Phylogenetic tree based on the analysis of 13 Type IIB restriction endonucleases
Simulate restriction digests of 21 genomes
DNA fragments range from 21-33 bp in sizeCalculate distance between two genomes based on number of shared fragments
D. melanogaster
D. simulans
D. sechellia
D. yakuba
D. erecta
D. ficusphila
D. eugracilis
D. biarmipes
D. takahashii
D. elegans
D. rhopaloa
D. kikkawai
D. ananassae
D. bipectinata
D. santomea
D. persimilis
D. willistoni
D. mojavensis
D. virilis
D. grimshawi
D. pseudoobscura
Seetharam AS and Stuart GW.
PeerJ
. 2013 Dec 23;1:e226.Slide60
RAMPAGE protocolSlide61
Signals in the FlyBase RAMPAGE and MachiBase TSS tracks are
off by one base
https://
wiki.flybase.org
/wiki/
FlyBase:GBrowse_Tracks#Aligned_Evidence
“
FlyBase:
GBrowse Tracks
” page on the FlyBase Wiki Slide62
Conservation tracks on the
D. melanogaster GEP UCSC Genome Browser
Whole genome alignments of multiple Drosophila species
Drosophila
Chain/Net composite track
Generate multiple sequence alignments from these pairwise alignments (ROAST)
Identify conserved regions from ROAST alignments
PhastCons: identify conserved elementsPhyloP: measure level of selection at each nucleotideMultiz alignment of 27 insect species available on the official UCSC Genome BrowserAug. 2014 (BDGP Release 6 + ISO1 MT/dm6) assemblySlide63
Improve the multiple sequence alignments for Drosophila
Use the most recent version of the RefSeq assemblies
UCSC uses draft assemblies instead of the CAF1 assemblies
Species-specific
repeat masking
Use more sensitive alignment tools
LAST, ROAST
Manual optimizations of alignment parametersSlide64
Alignment of 27 insects on the official UCSC Genome Browser
Drosophila
species
Other insects
chr4Slide65
Alignment of 28
Drosophila
species on the GEP UCSC Genome Browser
Drosophila
species
chr4Slide66
Genome alignments of 27 Drosophila
species against D. melanogaster
Drosophila Chain/Net:Slide67
Use the conservation tracks to identify regions under selection
PhyloP scores:
Under negative selection
Fast-evolving
PhyloP
Conserved Elements
phastConsSlide68
Examine the ROAST alignments to identify the orthologous TSS regionsSlide69
Use core promoter motifs to support TSS annotations
Some sequence motifs are enriched in the region (~300 bp) surrounding the TSS
Some motifs (e.g., Inr, TATA) are well-characterizedOther motifs are identified based on computational analysis
Presence of core promoter motifs can be used to support the TSS annotations
Inr motif (
TC
A
KTY) overlaps with the TSS (-2 to +4)Absence of core promoter motifs is a negative resultMost D. melanogaster TSS do not contain the Inr motifSlide70
Use UCSC Genome Browser
Short Match to find Drosophila
core promoter motifs
Ohler
U,
et al.
Genome Biol. 2002; 3(12):RESEARCH0087.
TATA box
Initiator (Inr)Available under “Projects” “Annotation Resources”
“Core Promoter Motifs” on the GEP web site: http://gander.wustl.edu/~wilson/core_promoter_motifs.htmlSlide71
Core Promoter Motifs tracks
Show core promoter motif matches for each contig
Separated by strandVisualize matches to different core promoter motifsUse UCSC Table Browser (or other means) to export the list of motif matches within the search regionSlide72
DEMO
: Use the Inr motif to support the TSS position of Rad23Slide73
RNA PolII ChIP-Seq tracks
Show regions that are enriched in RNA Polymerase II compared to input DNA
Gene Models
RNA PolII Peaks
RNA PolII Enrichment
RNA-SeqSlide74
Using RNA-Seq and RNA PolII ChIP-Seq data to define the TSS search region
Narrow TSS
search region
D. mel Transcripts
RNA PolII Peaks
RNA PolII Enrichment
RNA-SeqSlide75
Define the TSS search region based on
blastn
alignment to the initial transcribed exon
If the
blastn
alignment satisfies these criteria:
Has an E-value less than 1e-5
Covers more than half of the initial transcribed exonRequires extrapolation of less than 150bp to estimate the TSS positionIs in congruence with other evidence tracks Examples: RNA-Seq read coverage, RNA PolII ChIP-Seq dataDefine TSS search region as ±300bp surrounding the TSS position estimated from the blastn alignment
See Module TSS4 for detailsSlide76
TSS annotation for Rad23
TSS position: 28,936
Conservation with D. melanogaster
blastn
search of initial exon
“D. mel Transcripts” track
Location of the Inr motifTSS search region: 28,716-28,936Enrichment of RNA PolII upstream of the TSS positionRNA-Seq read coverage upstream of the TSS positionSearch region defined by the extent of the RNA PolII peakSlide77
Additional TSS annotation resources
The
D. melanogaster gene annotations are the primary source of evidence
Resources that could be useful if the
D. melanogaster
evidence is ambiguous
Multiple sequence alignments of 28
Drosophila speciesPhastCons and PhyloP conservation scoresGenome browsers for 27 Drosophila speciesRNA Pol II ChIP-Seq dataCAGE data for D. pseudoobscuraRAMPAGE data for D. erecta, D. ananassae, and D. pseudoobscuraATAC-Seq data for 10
Drosophila speciesRNA-Seq coverage, splice junctions, assembled transcriptsGnomon, Augustus, and N-SCAN PASA-EST gene predictionsSlide78
TSS annotation summary
Most of the
D. melanogaster core promoters have multiple TSSClassify the type of promoter (peaked/intermediate/broad) based on the transcriptome evidence from D. melanogaster
Define search regions that might contain TSS
Use multiple lines of evidence to infer the TSS region
Identify initial exon
RNA-Seq coverage
blastn (change search parameters)Distribution of core promoter motifs (e.g., Inr)RNA PolII ChIP-Seq peaksMaintain conservation compared to D. melanogaster