Stanford University Evolver Motivation Genomics algorithms Wholegenome alignment Ancestral reconstruction Accuracy unknown Simulation required No realistic wholegenome simulator Evolver Sequence evolution simulator ID: 271776
Download Presentation The PPT/PDF document "Bob Edgar and Arend Sidow" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Bob Edgar and Arend SidowStanford University
EvolverSlide2
MotivationGenomics algorithmsWhole-genome alignmentAncestral reconstructionAccuracy unknownSimulation requiredNo realistic whole-genome simulatorSlide3
EvolverSequence evolution simulatorWhole mammalian genomeMutationsAll length scalesSingle base substitutions……to chromosome fission and fusion
ConstraintGene model and non-coding elementsSlide4
MutationsSubstituteDeleteCopy
Tandem or non-tandemExpand/contract simple repeat arrayMoveInvert
Insert
Random sequence
Mobile element
Retroposed
pseudo-geneSlide5
Length-dependent ratesRate
Length
1 2 3 4 5 6 7 8
Missing values computed by linear interpolation
Any number of (Length,Rate) pairs given as input
Total rate = sum of bar heights
Zero rate if
L
> max givenSlide6
Annotation
UTR
CDS
CDS
CDS
UTR
START
STOP
acceptor
donor
acceptor
donor
acceptor
donor
NXE
UTR
UTR
CDS
CDS
CDS
UTR
START
STOP
acceptor
donor
acceptor
donor
acceptor
donor
NXE
NXE
NGE
G
ene
NGE
NXE
Neutral
Neutral
Neutral
Non-Gene Conserved Element
Non-Exon
Conserved Element
Simple repeat
CpG island
CpGSlide7
ConstraintEvery base has “accept probability” PAcceptProbability that a mutation is accepted
Same for all mutations (subst., delete, insert...)Special cases for coding sequence20x20 amino acid substitution probability table
Accept
prob
=
P
Accept
1st base in codon
x
P
a.a
. accept
Frame preservedSlide8
RejectionEvents proposed at fixed rates (neutral)Locus selected at random, uniform distributionAccept probability computed from
PAccept’sMultiple bases = product
Equivalent to accept (mutate 1 AND mutate 2 ... )
0.3
0.8
0.5
0.4
P
Accept
= 0.8 x 0.5 = 0.4
A
G
C
TSlide9
Gene modelCoding sequence (CDS)Amino acid substitution, frame preservedUTRsSplice sites2 donor, 2 acceptor sites with PAccept=0Non-exon elements (NXEs)
PAccept<1, no other special propertiesSlide10
Non-gene conserved elementsNon-gene elements (NGEs)PAccept<1, no other special propertiesSlide11
Mobile elementsInitial library of sequencesUpdated regularly—MEs evolveFaster rate than hostUsing intra-chromosome EvolverBirth/death process
Terminal repeats special-casedPer-ME parameters for insert rate etc.Slide12
Retro-posed pseudo-genesInserted like mobile elementsBirth/death process for active RPGsRegular updates:Genes selected at random from genomeSpliced sequence computedAdded to mobile element/RPG sequence librarySlide13
Gene duplicationTriggered by any inter- or intra-chromosome copy of complete gene
New Slower
New Same
New Faster
New Disabled
Old Slower
5
8
8
-
Old Same
20
20
20
200
Old Faster
50
15
50
-
Old Disabled
-
25
10
-Slide14
Constraint change eventsChange annotation, not sequenceCEs created, deleted and movedCpG islands created, deleted and movedCE speed change (P
Accept’s changed)Gene duplicationSide-effect of copyGene loss
Special case handled between cyclesSlide15
UTR
UTR
CDS
CDS
CDS
UTR
MoveStartCodonIntoCDS
MoveStartCodonIntoUTR
MoveDonorIntoCDS
MoveCDSDonorIntoIntron
MoveCDSAcceptorIntoIntron
MoveAcceptorIntoCDS
MoveUTRTerm
MoveUTRTerm
MoveStopCodonIntoUTR
MoveStopCodonIntoCDS
MoveUTRAcceptorIntoIntron
MoveAcceptorIntoUTR
MoveDonorIntoUTR
MoveUTRDonorIntoIntron
Move translation terminal
Move transcription terminal
Move splice site
Move START
Move STOP
Move UTR splice
Move CDS splice
Move Acceptor
Move
Donor
Gene structure changesSlide16
AlignmentsHomology to all ancestors is trackedRelationships not tracked:Ancestral paralogyE.g. segmental duplications already presentMobile elementsRetroposed pseudo-genes
Output: ancestor-leaf and leaf-leafSlide17
AlignmentsAlign residues if:Homologousand no intervening duplication before MRCAAvoids problem of ancestral paralogy
Probably the most biologically informativeDoes align segmental duplicationsDoes align tandem duplicationsSilly for very short tandems, need to filterSlide18
Ancestral genomeModel organismHuman (hg18)Slide19
Ancestral annotationsUCSC browser tracksCDS, UTR, CpG islandsSplice sites inserted at terminals of all intronsSimple repeatsTandem Repeat FinderNon-exon and non-gene elements
Generated according to stochastic modelSlide20
Generating NGEs and NXEsLength histogram as for event ratesCover 7% of genome with random CEs
Frequency
Length
1 2 3 4 5 6 7 8Slide21
Generating NGEs and NXEs
Assign ~50% to genes
CDS
CDS
UTR
CDS
CDS
UTR
CDS
CDS
UTR
CDS
CDS
UTR
NGE
NXE
d = approx ¼ of inter-gene distance
(
selected from normal distribution)
NGE
NXE if distance < d
NGE if distance > dSlide22
ValidationSimulate “human-mouse” and “human-dog”
Ancestor (hg18)
0.17
0.24
0.40
“human”
“dog”
“mouse”Slide23
Simulated human-mouseSlide24
Real human-mouseSlide25
Simulated human-dogSlide26
Real human-dogSlide27
“Human”
“Dog”
“Mouse”
hg18Slide28
Evolver modules
Intra-chromosome
Substitute
Move
Copy
Invert
Delete
Insert
Inter-chromosome
Move
Copy
Split
FuseSlide29
Cycles
Intra
Chr 1
Intra
Chr 2
Intra
Chr
N
Inter
Intra
Chr 1
Intra
Chr 2
Intra
Chr
N
Inter
One cycleSlide30
Time0.01 subs/site cycle = 1 CPU dayENCODE tree (30 mammals) = 500 CPU daysSlide31
Memory and file sizesRAM: 40 bytes/base100 Mb chromosome RAM = 4 GbHuman chr.1 (240 Mb) RAM = 12 Gb
Alignment filesCustom highly compressed binary formatStandard formats too big (many short hits)Grow with distance
“Human-mouse/dog” distance ~0.5 subs/site
Alignment files ~5
GbSlide32
CollaboratorsGeorge AsimenosSerafim BatzoglouSlide33
Thank you.
Rose-picking in the Rose valley near the town of
Kazanlak
in Bulgaria, 1870s, engraving by an Austro-Hungarian traveler Felix Philipp
Kanitz
. Published in his book "
Donau
Bulgarien
und
der
Balkan
” Leipzig, 1879, p. 238.