/
Searching for Transcription Start Sites in Searching for Transcription Start Sites in

Searching for Transcription Start Sites in - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
344 views
Uploaded On 2019-02-01

Searching for Transcription Start Sites in - PPT Presentation

Drosophila 012019 Wilson Leung Outline Transcription start sites TSS annotation goals Promoter architecture in D melanogaster Find the initial transcribed exon in the target species Annotate putative transcription start sites ID: 749410

promoter tss core melanogaster tss promoter melanogaster core genome multiple drosophila rna motifs species genes gene modencode seq identify

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Searching for Transcription Start Sites ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Searching for Transcription Start Sites in Drosophila

09/2020

Wilson LeungSlide2

Outline

Transcription start sites (TSS) annotation goals

Promoter architecture in D. melanogasterDefine TSS positions and TSS search regions in D. ananassaeSlide3

GEP Annotation Projects

GEP Publications

Motif Project

Informant Species

Expanded F Project

D. busckii

D. hydei

D. navojoa

D. arizonae

D. obscura

D. serrata

D. suzukii

D. grimshawi

D. virilis

D. mojavensis

D. willistoni

D. miranda

D. pseudoobscura

D. persimilis

D. bipectinata

D. ananassae

D. kikkawai

D. ficusphila

D. rhopaloa

D. elegans

D. takahashii

D. biarmipes

D. eugracilis

D. yakuba

D. erecta

D. melanogaster

D. simulans

D. sechellia

Tree scale: 0.1

The

Pathways Project

annotates genes from

27

Drosophila

species

F Element Projects

The

Parasitoid Wasp Project

annotates genes from 4 wasp speciesSlide4

Research goals for the

Drosophila

Muller F element motif project

Identify

conserved, regulatory motifs

that enable F element genes to be expressed in a heterochromatic environment

Riddle NC,

et al.

PLoS Genet. 2012 Sep;8(9):e1002954.Slide5

D. melanogaster

D. simulans

D. sechellia

D. yakuba

D. erecta

D. ficusphila

D. eugracilis

D. biarmipes

D. takahashii

D. elegans

D. rhopaloa

D. kikkawai

D. ananassae

D. bipectinata

Use comparative genomics to identify

factors that enable gene expression

on the expanded F element

4

HP1a

chr4

Pericentric heterochromatin

D. melanogaster

4R

HP1a

4L

D. ananassaeSlide6

D. ananassae

and

D. melanogaster

F element genes show

similar range of expression levels

Chen Z-X,

et al.

Genome Res. 24:1209-1223

D. ananassae

Adult Females

CAI (Codon Bias)

Expression Levels (rlog)

LOESS Regression Line

Adult Males

Adult Females

D. mel

: F (modENCODE)

D. ana

: F (modENCODE)Slide7

Transcription factor binding sites (motifs) are

short

and degenerate

1

2

3

4

5

6

7

8

9

10

A46

1203

1135

79

C

22236

64070

054

1333G

911

30314

027

1110

T3631

23454

056

54019

1

2

3

4

567

8910

http://jaspar.genereg.net/matrix/MA0205.1/

Bits

1.5

1.0

0.5

0.0Slide8

Futility theorem: difficult to find

functional motif instances

in the entire genome

Proposed by Wasserman WW and

Sandelin

A

Nat Rev Genet. 2004 Apr;5(4):276-87

Most motif instances in the genome are false positivesMultiple testing correction reduce power to detect biologically interesting motifs as the search space increase

FIMO Tutorial:

http://meme-suite.org/doc/fimo-tutorial.html Slide9

TSS of F element genes show lower levels of H3K9me3 and HP1a

H3K9me2

H3K9me3

H3K36me3

HP1a

POF

Su(

var

)3-9

PolII

Riddle NC,

et al.

PLoS Genet. 2012 Sep;8(9):e1002954.

H3K36me3

HP1a

POF

Su(

var

)3-9

PolIISlide10

Three strategies for motif finding

Multiple genes in a single species

Genes with common expression patternSequences associated with ChIP-Seq peaksSingle gene in multiple speciesPhylogenetic footprinting

Multiple genes in multiple species

Compare multiple sequence

alignment profiles

of multiple genes (

Magma)Slide11

Motif finding using multiple genes within a single species

Sequences surrounding TSS

Predicted motif instances

1

2

3

4

5

6

7

8

9

10

A

4

6

12

0

3

1

13

579C22

236640700541333

G91130314027

1110T3631234540

5654019

Bits

0.01.0

2.0510

Trl: FlyReg_DNaseISlide12

Genes

PhyloP

phastCons

Conserved Elements

Multiple Sequence Alignment

D. mel

: chr4

Motif finding using single gene in multiple speciesSlide13

Motif finding using multiple genes in multiple species (

PhyloNet

)

Based on Figure 1 from Wang T and Stormo GD. PNAS 2005 Nov 29;102(48):17400-5.

Promoter sequences

Conserved motifs

1. Identify conserved regions (profiles) in whole genome multiple sequence alignments

2. Identify multiple genes in the genome with similar alignment profilesSlide14

Magma: Multiple Aligner of Genomic Multiple Alignments

Key features of

Magma:Runs ~70x faster

than

PhyloNet

Analyze multiple sequence alignments with

gaps

Use set-covering approach to minimize redundancy in discovered motifsIhuegbu NE, Stormo GD, Buhler J. J Comput Biol. 2012 Feb;19(2):139-47.

Computationally tractable to analyze conserved motifs in multiple eukaryotic genomesSlide15

Drosophila

species used to identify conserved regulatory motifs on the F element

D. melanogaster

D. simulans

D. sechellia

D. yakuba

D. erecta

D. ficusphila

D. eugracilis

D. biarmipes

D. takahashii

D. elegans

D. rhopaloa

D. kikkawai

D. ananassae

D. bipectinata

D. pseudoobscura

D. persimilis

D. willistoni

D. mojavensis

D. virilis

D. grimshawi

Species sequenced by modENCODE

Species

Substitutions per neutral

site

D. ficusphila

0.80

D. eugracilis

0.76

D. biarmipes

0.70

D. takahashii

0.65

D. elegans

0.72

D. rhopaloa

0.66

D. kikkawai

0.89

D. bipectinata

0.99

Data from Table 1 of the modENCODE comparative genomics white paper

F element motif projectSlide16

TSS annotation goals for the

F element project

Research goal:Identify

motifs

that enable Muller F element genes to function within a heterochromatic environment

Annotation goals:

Define

search regions enriched in regulatory motifsDefine precise location of TSS if possibleDefine search regions where TSS could be foundDocument the evidence used to support the TSS annotationsSlide17

Strategies for utilizing the

GEP TSS annotations

Use TSS annotations to

anchor

the whole-genome multiple sequence alignments

Multiple sequence alignment is hard (

NP-hard

)Example: anchored multiple alignment with DIALIGNAlign orthologous promoters to identify conserved regulatory motifs:Examples: EvoPrinterHD, Pro-Coffee

De novo and discriminatory motif discoveryAnalyze TSS search regions to identify over-represented motifsExamples: MEME Suite, HOMERSlide18

Challenges with TSS annotations

Fewer constraints on untranslated regions (UTRs)

UTRs evolve more quickly

than coding regions

Open reading frames, compatible phases of donor and acceptor sites do not apply to UTRs

Most genes on the expanded F element have

large introns

Low percent identity (~50-70%) between the target genome and the D. melanogaster UTRsMost ab initio gene finders do not predict UTRsCannot use RNA-Seq data to precisely define the TSSSlide19

TSS annotation workflow

Identify the ortholog

Note the gene structure in D. melanogaster

Annotate the coding exons

Classify the type of core promoter in

D. melanogaster

Annotate the initial transcribed exon

Define TSS positions or TSS search regionsSlide20

RNA Polymerase II core promoter

Initiator motif (

Inr) contains the TSSTFIID binds to the TATA box and Inr to initiate the assembly of the pre-initiation complex (PIC)

Juven

-Gershon T and

Kadonaga

JT. Dev Biol. 2010 Mar 15;339(2):225-9.Slide21

Core promoter motifs can affect gene expression levels

Juven

-Gershon T,

et al

.

Nat Methods. 2006 Nov;3(11):917-22.

SCP1:Slide22

High turnover of core promoter motifs between

D. erecta

and

D. ananassae

Rach EA

et al.

Genome Biol. 2009;10(7):R73.Slide23

Peaked versus broad promoters

Kadonaga

JT. Wiley

Interdiscip

Rev Dev Biol. 2012 Jan-Feb;1(1):40-51.

Peaked promoter

(Single strong TSS)

Broad promoter

(Multiple weak TSS)

50-300 bpSlide24

RNA-Seq biases introduced by library construction

cDNA fragmentation

Strong bias at the 3’ end

RNA fragmentation

More uniform coverage

Miss the 5’ and 3’ ends of the transcript

Wang Z,

et al. Nat Rev Genet. 2009 Jan;10(1):57-63.Gene Span

RNA-Seq Read Count

5’3’Slide25

Techniques for finding TSS

Identify the 5’ cap at the beginning of the mRNA

Cap Analysis of Gene Expression (CAGE)

RNA Ligase Mediated Rapid Amplification of cDNA Ends (

RLM-RACE

)

Cap-trapped Expressed Sequence Tags (5’ ESTs)

More information on these techniques:Takahashi H, et al. CAGE (cap analysis of gene expression): a protocol for the detection of promoter and transcriptional networks. Methods Mol Biol. 2012 786:181-200.Sandelin A, et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nat Rev Genet. 2007 Jun;8(6):424-36.Slide26

Promoter architecture in Drosophila

Classify core promoter based on the

Shape Index (SI)Determined by the distribution of CAGE and 5’ RLM-RACE reads

Shape index is a

continuum

Most promoters in

D. melanogaster

contain multiple TSSMedian width = 162 bp~70% of vertebrate genes have broad promotersHoskins RA, et al. Genome Res. 2011 Feb;21(2):182-92.Slide27

Genes with peaked promoters show stronger spatial and tissue specificity

46% of genes with broad promoters are expressed in all stages of embryonic development

19% of genes with peaked promoters are expressed in all stages

Hoskins RA,

et al.

Genome Res. 2011 Feb;21(2):182-92.Slide28

Resources for classifying the type of core promoter in

D. melanogaster

Only a subset of the modENCODE data are available through FlyBase D. melanogaster

GEP UCSC Genome Browser [

Aug. 2014 (BDGP Release 6) assembly

]

FlyBase gene annotations (release 6.34)

modENCODE TSS (Celniker) annotationsDNase I hypersensitive sites (DHS)CAGE and RAMPAGE datasets9-state and 16-state chromatin modelsTranscription factor binding site (TFBS) HOT spotsSlide29

9-state chromatin model

Kharchenko

PV,

et al.

Nature. 2011 Mar 24;471(7339):480-5.Slide30

DNaseI Hypersensitive Sites (DHS) correspond to accessible regions

Aasland

R and Stewart AF. Methods Mol Biol. 1999;119:355-62.

Ho JW,

et al.

Nature. 2014 Aug 28; 512(7515):449-52.Slide31

High

turnover

of DNase I Hypersensitive Sites (DHS) between human and mouse

Vierstra

J

et al.

Science. 2014 Nov 21;346(6212):1007-12.Slide32

Three common types of core promoters

Haberle V and Stark A. Nat Rev

Mol

Cell Biol. 2018 Oct;19(10):621-637.Slide33

modENCODE TSS annotations

Two sets of modENCODE TSS predictionsTSS (Celniker)

Most recent dataset produced by modENCODEAvailable on the GEP UCSC Genome Browser

TSS (Embryonic)

Older dataset available from FlyBase

GBrowse

Use

TSS (Celniker) dataset as the primary evidenceHoskins RA, et al. Genome Res. 2011 Feb;21(2):182-92Slide34

Classify the D. melanogaster

core promoter based on (TSS) Celniker annotations and DHS positions

Consider DHS positions within a 300bp window surrounding the start of the D. melanogaster

transcript

TSS classification

# Annotated

TSS

# DHS

positionsPeaked101

011Intermediate≤ 1> 1> 1≤ 1

Broad> 1> 1Insufficient evidence00Slide35

DEMO

: Classify the core promoter of D. melanogaster Rad23Slide36

BG3 9-state

S2 9-state

chr4

CG2316-RB

CG2316-RD

CG2316-RC

CG2316-RA

CG2316-RG

CG2316-RH

BG3 DHS

S2 DHS

Kc DHS

Additional DHS data from different stages of embryonic development DHS data produced by the BDTNP projectEvidence tracks:Detected DHS Positions (Embryos)DHS Read Density (Embryos)

Thomas S, et al. Genome Biol. 2011;12(5):R43.

Stage 11

Stage 14TSS (Celniker)

Stage 5

Stage 9

Stage 10Slide37

Standardize analysis of MachiBase and modENCODE CAGE data using

CAGEr

Bioconductor package developed by RIKENMap datasets against

release 6

assembly

37 modENCODE CAGE samples; 7 MachiBase samples

Define

TSS and promoters for each sampleDefine consensus promoters across all samplesHaberle V, et al. Nucleic Acids Res. 2015 Apr 30;43(8):e51.Slide38

TSS classifications based on

CAGEr

Peaked

FlyBase Genes

modENCODE CAGE Peaks

modENCODE CAGE (Plus)

Intermediate

FlyBase Genes

modENCODE CAGE Peaks

modENCODE CAGE (Plus)

Broad

modENCODE CAGE Peaks

modENCODE CAGE (Minus)

FlyBase GenesSlide39

Benefits of RAMPAGE

RAMPAGE =

RNA

A

nnotation and

M

apping of

Promoters for Analysis of Gene ExpressionCAGE only allows sequencing of short sequence tags (~27 bp) near the 5’ capAmbiguous read mapping to large parts of the genomeRAMPAGE produces long

paired-end reads instead of short sequence tagsDeveloped novel algorithm to identify TSS clustersUsed paired-end information during peak callingUsed Cufflinks to produce partial transcript modelsBatut P and Gingeras TR. Curr

Protoc Mol Biol. 2013 Nov 11;104:Unit 25B.11.Slide40

RAMPAGE results on the GEP

UCSC Genome Browser

Lifted RAMPAGE results from release 5 to release 6Results from

36 developmental stages

Combined TSS peak call from all samples

Available under the “

Expression and Regulation

” sectionSlide41

Changes in the dominant TSS of

Rad23

across different developmental stages

Stages of Development

Adult females

Rad23

CAGE Tag ClustersSlide42

New

Drosophila

RAMPAGE datasets

D. melanogaster

RAMPAGE data from second biological replicate

Lifted results from release 5 to the release 6 assembly

Combined RAMPAGE TSS (Replicate 2) (R5)” trackRAMPAGE data from three other Drosophila speciesD. erecta, D. ananassae, and D. pseudoobscuraSamples collected at 1 hour intervals throughout embryonic development (1-hour to 23-hour embryos)“

Combined RAMPAGE TSS” trackBatut PJ and Gingeras TR. Elife. 2017 Dec 20;6.Slide43

Evidence for TSS annotations

(in general order of importance)

Experimental dataRAMPAGE

ATAC-Seq

RNA-Seq

Conservation

Type of TSS (peaked/intermediate/broad)

in D. melanogasterSequence similarity to initial exon in D. melanogasterSequence similarity to other Drosophila species (ROAST)Core promoter motifsInr, TATA box, etc.Slide44

Evidence tracks for TSS annotations in the

D. ananassae GEP UCSC Genome Browser

Combined

RAMPAGE

TSS

Eye discs

ATAC-Seq

Histone modifications ChIP-Seq (H3K4me2)TSS predictions from Gnomon, N-SCAN PASA-EST, and AugustusRNA-Seq data from multiple stages and tissuesCombined splice junction predictionsSlide45

TSRchitect

analysis results for the Drosophila RAMPAGE datasets

D. melanogaster

RAMPAGE evidence tracks:

TSRchitect Combined RAMPAGE TSS (Replicate 1)

TSRchitect Combined RAMPAGE TSS (Replicate 2)

RAMPAGE data from three other Drosophila speciesD. erecta, D. ananassae, and D. pseudoobscuraSamples collected at 1-hour intervals throughout embryonic development (1-hour to 23-hour embryos)

“Combined RAMPAGE TSS” trackBatut PJ and Gingeras TR. Elife. 2017 Dec 20;6.Slide46

Calculate promoter Shape Index (SI) with

TSRchitect

https://bioconductor.org/packages/release/bioc/html/TSRchitect.html

ATAC-Seq

Adult Females

RNA-Seq

D.

mel

Proteins

GeMoMa

Combined RAMPAGE

PeakedSlide47

ATAC-Seq:

A

ssay for Transposase-

A

ccessible

C

hromatin with high-throughput

SEQuencing Buenrostro JD, et al. Curr Protoc Mol Biol. 2015 Jan 5;109:21.29.1-21.29.9.Slide48

ATAC-Seq

data for

D. ananassae and 10 other Drosophila species

Jacobs J

et al.

Nat Genet. 2018 Jul;50(7):1011-1020.

Rad23

RepeatMasker

Adult Females

Adult Males

Embryos

RNA-Seq

regtools

ATAC-Seq

D.

mel

Proteins

GeMoMaSlide49

Determine the gene structure in

D. melanogaster

FlyBase:

GBrowse

UTR

CDS

Gene Record Finder

:

Transcript DetailsSlide50

Identify the initial transcribed exon using NCBI blastn

Retrieve the sequences of the initial exons from the

Transcript Details

tab of the

Gene Record Finder

Use placement of the flanking exons to reduce the size of the search region if possible

Increase sensitivity of nucleotide searches

Change Program Selection to blastnChange Word size to 7Change Match/Mismatch Scores to +1, -1Change Gap Costs to Existence: 2, Extension: 1Slide51

Optimize alignment parameters based on expected levels of conservation

Derive alignment scores using information theory

Relative entropy of target and background frequenciesMatch +2, Mismatch -3 optimized for

90% identity

Match +1, Mismatch -1 optimized for

75% identity

States DJ,

et al. Methods. 1991 3:66-70.Slide52

Perform

blastn

search of Rad23:1 against the

D. ananassae

contig

D. melanogaster

exon Rad23:1

88000

90000

Limit the search region via Subject subrange

D. ananassae

contig20Slide53

Extrapolate TSS position based on

blastn

alignment of the initial transcribed exon

Assume the length of the initial transcribed exon is conserved between

D. melanogaster

and

D. ananassae

blastn

: D. mel: Rad23:1 (Query) vs. contig20 (Sbjct)

Query start: 128

Extrapolate TSS position:

89047-127 =

88,920

Start codonSlide54

TSS annotation for Rad23

in D. ananassae

TSS position:

89,003

Position with highest combined RAMPAGE read density

Narrow TSS search region:

88,981-89,060

Peaked promoter with an SI value of -0.82Wide TSS search region: 88,776-89,158MACS2 peak from ATAC-Seq of eye-antennal discsEncapsulated the TSS position inferred from blastn search against Rad23:1Slide55

TSS annotation summary

Most of the

D. melanogaster core promoters have multiple TSSClassify the type of promoter (peaked/intermediate/broad) based on the transcriptome evidence from D. melanogaster

Use multiple lines of evidence to infer the TSS region in

D. ananassae

RAMPAGE

ATAC-Seq

H3K4me2 ChIP-SeqRNA-Seq coverageblastn (change search parameters)Slide56

Questions?

PhyloP scores:

Under negative selection

Fast-evolving

PhyloP

Conserved Elements

phastConsSlide57
Slide58

Structure of a typical mRNA

Pesole G.

et al. Genome Biology. 2002: 3(3) reviews0004.1-reviews0004.10.Slide59

Phylogenetic tree based on the analysis of 13 Type IIB restriction endonucleases

Simulate restriction digests of 21 genomes

DNA fragments range from 21-33 bp in sizeCalculate distance between two genomes based on number of shared fragments

D. melanogaster

D. simulans

D. sechellia

D. yakuba

D. erecta

D. ficusphila

D. eugracilis

D. biarmipes

D. takahashii

D. elegans

D. rhopaloa

D. kikkawai

D. ananassae

D. bipectinata

D. santomea

D. persimilis

D. willistoni

D. mojavensis

D. virilis

D. grimshawi

D. pseudoobscura

Seetharam AS and Stuart GW.

PeerJ

. 2013 Dec 23;1:e226.Slide60

RAMPAGE protocolSlide61

Signals in the FlyBase RAMPAGE and MachiBase TSS tracks are

off by one base

https://

wiki.flybase.org

/wiki/

FlyBase:GBrowse_Tracks#Aligned_Evidence

FlyBase:

GBrowse Tracks

” page on the FlyBase Wiki Slide62

Conservation tracks on the

D. melanogaster GEP UCSC Genome Browser

Whole genome alignments of multiple Drosophila species

Drosophila

Chain/Net composite track

Generate multiple sequence alignments from these pairwise alignments (ROAST)

Identify conserved regions from ROAST alignments

PhastCons: identify conserved elementsPhyloP: measure level of selection at each nucleotideMultiz alignment of 27 insect species available on the official UCSC Genome BrowserAug. 2014 (BDGP Release 6 + ISO1 MT/dm6) assemblySlide63

Improve the multiple sequence alignments for Drosophila

Use the most recent version of the RefSeq assemblies

UCSC uses draft assemblies instead of the CAF1 assemblies

Species-specific

repeat masking

Use more sensitive alignment tools

LAST, ROAST

Manual optimizations of alignment parametersSlide64

Alignment of 27 insects on the official UCSC Genome Browser

Drosophila

species

Other insects

chr4Slide65

Alignment of 28

Drosophila

species on the GEP UCSC Genome Browser

Drosophila

species

chr4Slide66

Genome alignments of 27 Drosophila

species against D. melanogaster

Drosophila Chain/Net:Slide67

Use the conservation tracks to identify regions under selection

PhyloP scores:

Under negative selection

Fast-evolving

PhyloP

Conserved Elements

phastConsSlide68

Examine the ROAST alignments to identify the orthologous TSS regionsSlide69

Use core promoter motifs to support TSS annotations

Some sequence motifs are enriched in the region (~300 bp) surrounding the TSS

Some motifs (e.g., Inr, TATA) are well-characterizedOther motifs are identified based on computational analysis

Presence of core promoter motifs can be used to support the TSS annotations

Inr motif (

TC

A

KTY) overlaps with the TSS (-2 to +4)Absence of core promoter motifs is a negative resultMost D. melanogaster TSS do not contain the Inr motifSlide70

Use UCSC Genome Browser

Short Match to find Drosophila

core promoter motifs

Ohler

U,

et al.

Genome Biol. 2002; 3(12):RESEARCH0087.

TATA box

Initiator (Inr)Available under “Projects”  “Annotation Resources”

 “Core Promoter Motifs” on the GEP web site: http://gander.wustl.edu/~wilson/core_promoter_motifs.htmlSlide71

Core Promoter Motifs tracks

Show core promoter motif matches for each contig

Separated by strandVisualize matches to different core promoter motifsUse UCSC Table Browser (or other means) to export the list of motif matches within the search regionSlide72

DEMO

: Use the Inr motif to support the TSS position of Rad23Slide73

RNA PolII ChIP-Seq tracks

Show regions that are enriched in RNA Polymerase II compared to input DNA

Gene Models

RNA PolII Peaks

RNA PolII Enrichment

RNA-SeqSlide74

Using RNA-Seq and RNA PolII ChIP-Seq data to define the TSS search region

Narrow TSS

search region

D. mel Transcripts

RNA PolII Peaks

RNA PolII Enrichment

RNA-SeqSlide75

Define the TSS search region based on

blastn

alignment to the initial transcribed exon

If the

blastn

alignment satisfies these criteria:

Has an E-value less than 1e-5

Covers more than half of the initial transcribed exonRequires extrapolation of less than 150bp to estimate the TSS positionIs in congruence with other evidence tracks Examples: RNA-Seq read coverage, RNA PolII ChIP-Seq dataDefine TSS search region as ±300bp surrounding the TSS position estimated from the blastn alignment

See Module TSS4 for detailsSlide76

TSS annotation for Rad23

TSS position: 28,936

Conservation with D. melanogaster

blastn

search of initial exon

“D. mel Transcripts” track

Location of the Inr motifTSS search region: 28,716-28,936Enrichment of RNA PolII upstream of the TSS positionRNA-Seq read coverage upstream of the TSS positionSearch region defined by the extent of the RNA PolII peakSlide77

Additional TSS annotation resources

The

D. melanogaster gene annotations are the primary source of evidence

Resources that could be useful if the

D. melanogaster

evidence is ambiguous

Multiple sequence alignments of 28

Drosophila speciesPhastCons and PhyloP conservation scoresGenome browsers for 27 Drosophila speciesRNA Pol II ChIP-Seq dataCAGE data for D. pseudoobscuraRAMPAGE data for D. erecta, D. ananassae, and D. pseudoobscuraATAC-Seq data for 10

Drosophila speciesRNA-Seq coverage, splice junctions, assembled transcriptsGnomon, Augustus, and N-SCAN PASA-EST gene predictionsSlide78

TSS annotation summary

Most of the

D. melanogaster core promoters have multiple TSSClassify the type of promoter (peaked/intermediate/broad) based on the transcriptome evidence from D. melanogaster

Define search regions that might contain TSS

Use multiple lines of evidence to infer the TSS region

Identify initial exon

RNA-Seq coverage

blastn (change search parameters)Distribution of core promoter motifs (e.g., Inr)RNA PolII ChIP-Seq peaksMaintain conservation compared to D. melanogaster