/
Bob Edgar and Arend Sidow Bob Edgar and Arend Sidow

Bob Edgar and Arend Sidow - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
395 views
Uploaded On 2016-03-30

Bob Edgar and Arend Sidow - PPT Presentation

Stanford University Evolver Motivation Genomics algorithms Wholegenome alignment Ancestral reconstruction Accuracy unknown Simulation required No realistic wholegenome simulator Evolver Sequence evolution simulator ID: 271776

utr cds move gene cds utr gene move accept intra human elements donor acceptor chr sequence genome rate length nxe chromosome nge

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Bob Edgar and Arend Sidow" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Bob Edgar and Arend SidowStanford University

EvolverSlide2

MotivationGenomics algorithmsWhole-genome alignmentAncestral reconstructionAccuracy unknownSimulation requiredNo realistic whole-genome simulatorSlide3

EvolverSequence evolution simulatorWhole mammalian genomeMutationsAll length scalesSingle base substitutions……to chromosome fission and fusion

ConstraintGene model and non-coding elementsSlide4

MutationsSubstituteDeleteCopy

Tandem or non-tandemExpand/contract simple repeat arrayMoveInvert

Insert

Random sequence

Mobile element

Retroposed

pseudo-geneSlide5

Length-dependent ratesRate

Length

1 2 3 4 5 6 7 8

Missing values computed by linear interpolation

Any number of (Length,Rate) pairs given as input

Total rate = sum of bar heights

Zero rate if

L

> max givenSlide6

Annotation

UTR

CDS

CDS

CDS

UTR

START

STOP

acceptor

donor

acceptor

donor

acceptor

donor

NXE

UTR

UTR

CDS

CDS

CDS

UTR

START

STOP

acceptor

donor

acceptor

donor

acceptor

donor

NXE

NXE

NGE

G

ene

NGE

NXE

Neutral

Neutral

Neutral

Non-Gene Conserved Element

Non-Exon

Conserved Element

Simple repeat

CpG island

CpGSlide7

ConstraintEvery base has “accept probability” PAcceptProbability that a mutation is accepted

Same for all mutations (subst., delete, insert...)Special cases for coding sequence20x20 amino acid substitution probability table

Accept

prob

=

P

Accept

1st base in codon

x

P

a.a

. accept

Frame preservedSlide8

RejectionEvents proposed at fixed rates (neutral)Locus selected at random, uniform distributionAccept probability computed from

PAccept’sMultiple bases = product

Equivalent to accept (mutate 1 AND mutate 2 ... )

0.3

0.8

0.5

0.4

P

Accept

= 0.8 x 0.5 = 0.4

A

G

C

TSlide9

Gene modelCoding sequence (CDS)Amino acid substitution, frame preservedUTRsSplice sites2 donor, 2 acceptor sites with PAccept=0Non-exon elements (NXEs)

PAccept<1, no other special propertiesSlide10

Non-gene conserved elementsNon-gene elements (NGEs)PAccept<1, no other special propertiesSlide11

Mobile elementsInitial library of sequencesUpdated regularly—MEs evolveFaster rate than hostUsing intra-chromosome EvolverBirth/death process

Terminal repeats special-casedPer-ME parameters for insert rate etc.Slide12

Retro-posed pseudo-genesInserted like mobile elementsBirth/death process for active RPGsRegular updates:Genes selected at random from genomeSpliced sequence computedAdded to mobile element/RPG sequence librarySlide13

Gene duplicationTriggered by any inter- or intra-chromosome copy of complete gene

New Slower

New Same

New Faster

New Disabled

Old Slower

5

8

8

-

Old Same

20

20

20

200

Old Faster

50

15

50

-

Old Disabled

-

25

10

-Slide14

Constraint change eventsChange annotation, not sequenceCEs created, deleted and movedCpG islands created, deleted and movedCE speed change (P

Accept’s changed)Gene duplicationSide-effect of copyGene loss

Special case handled between cyclesSlide15

UTR

UTR

CDS

CDS

CDS

UTR

MoveStartCodonIntoCDS

MoveStartCodonIntoUTR

MoveDonorIntoCDS

MoveCDSDonorIntoIntron

MoveCDSAcceptorIntoIntron

MoveAcceptorIntoCDS

MoveUTRTerm

MoveUTRTerm

MoveStopCodonIntoUTR

MoveStopCodonIntoCDS

MoveUTRAcceptorIntoIntron

MoveAcceptorIntoUTR

MoveDonorIntoUTR

MoveUTRDonorIntoIntron

Move translation terminal

Move transcription terminal

Move splice site

Move START

Move STOP

Move UTR splice

Move CDS splice

Move Acceptor

Move

Donor

Gene structure changesSlide16

AlignmentsHomology to all ancestors is trackedRelationships not tracked:Ancestral paralogyE.g. segmental duplications already presentMobile elementsRetroposed pseudo-genes

Output: ancestor-leaf and leaf-leafSlide17

AlignmentsAlign residues if:Homologousand no intervening duplication before MRCAAvoids problem of ancestral paralogy

Probably the most biologically informativeDoes align segmental duplicationsDoes align tandem duplicationsSilly for very short tandems, need to filterSlide18

Ancestral genomeModel organismHuman (hg18)Slide19

Ancestral annotationsUCSC browser tracksCDS, UTR, CpG islandsSplice sites inserted at terminals of all intronsSimple repeatsTandem Repeat FinderNon-exon and non-gene elements

Generated according to stochastic modelSlide20

Generating NGEs and NXEsLength histogram as for event ratesCover 7% of genome with random CEs

Frequency

Length

1 2 3 4 5 6 7 8Slide21

Generating NGEs and NXEs

Assign ~50% to genes

CDS

CDS

UTR

CDS

CDS

UTR

CDS

CDS

UTR

CDS

CDS

UTR

NGE

NXE

d = approx ¼ of inter-gene distance

(

selected from normal distribution)

NGE

NXE if distance < d

NGE if distance > dSlide22

ValidationSimulate “human-mouse” and “human-dog”

Ancestor (hg18)

0.17

0.24

0.40

“human”

“dog”

“mouse”Slide23

Simulated human-mouseSlide24

Real human-mouseSlide25

Simulated human-dogSlide26

Real human-dogSlide27

“Human”

“Dog”

“Mouse”

hg18Slide28

Evolver modules

Intra-chromosome

Substitute

Move

Copy

Invert

Delete

Insert

Inter-chromosome

Move

Copy

Split

FuseSlide29

Cycles

Intra

Chr 1

Intra

Chr 2

Intra

Chr

N

Inter

Intra

Chr 1

Intra

Chr 2

Intra

Chr

N

Inter

One cycleSlide30

Time0.01 subs/site cycle = 1 CPU dayENCODE tree (30 mammals) = 500 CPU daysSlide31

Memory and file sizesRAM: 40 bytes/base100 Mb chromosome RAM = 4 GbHuman chr.1 (240 Mb) RAM = 12 Gb

Alignment filesCustom highly compressed binary formatStandard formats too big (many short hits)Grow with distance

“Human-mouse/dog” distance ~0.5 subs/site

Alignment files ~5

GbSlide32

CollaboratorsGeorge AsimenosSerafim BatzoglouSlide33

Thank you.

Rose-picking in the Rose valley near the town of

Kazanlak

in Bulgaria, 1870s, engraving by an Austro-Hungarian traveler Felix Philipp

Kanitz

. Published in his book "

Donau

Bulgarien

und

der

Balkan

” Leipzig, 1879, p. 238.