8 1 9 1 TTh 130250pm mostly Always M106 Prof Gill Bejerano CAs Boyoung Bo Yoo amp Yatish Turakhia Track class on Piazza CS273A Gill Lecture 15 Comparative Genomics ID: 814407
Download The PPT/PDF document "http://cs273a.stanford.edu [Bejerano Win..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
1
TTh 1:30-2:50pm, mostly Always M106*Prof: Gill BejeranoCAs: Boyoung (Bo) Yoo & Yatish Turakhia* Track class on Piazza
CS273A
Gill Lecture 15: Comparative Genomics II
The
Human
Genome
Source
Code
Slide2http://cs273a.stanford.edu [Bejerano Winter 2018/19]
2Announcements
Slide3TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
3Genome Evolutionhttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
Slide4Life’s Amazing Diversity – On Your Laptop NowMammals (250+)Birds (130+)Reptiles (35+)Amphibians (5+)Fish (200+)Genome papers barely scratch the surface of the mysteries genomes holdshttp://cs273a.stanford.edu [Bejerano Winter 2018/19]4
Slide5http://cs273a.stanford.edu [Bejerano Winter 2018/19]
5Our closest living relative species
Slide6http://cs273a.stanford.edu [Bejerano Winter 2018/19]
6What human-chimp changes do we find?
SmallLarge
Medium
Slide7http://cs273a.stanford.edu [Bejerano Winter 2018/19]
7What functional instructions can change?
Human/chimp genome: ~3*109 bpRough composition:Genes 2%Non-coding RNAs 2%Regulatory DNA 10-15%Repeats 40%Other 40%
Slide88
8
Humans and Chimpanzees Possess
Many Vastly Different Phenotypes
A: Chimp B: Human
A B
[Varki, A. and Altheide, T.,
Genome Res.
, 2005]
A B
Slide9Phenotype
Genotype
Genetic basis of human phenotypes?
Number of rearrangements
9
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Most mutations
are near/neutral.
Slide10The Genotype - Phenotype dividehttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
10
Can we find evolutionary patterns that are distinct enough to be phenotypically revealing?Species ASpecies B
Problem #1
:Too many nucleotide changes between any pair of related species (or individuals).
The vast majority of these are near/neutral.
Slide11Genotype -> Phenotype screenshttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
11
deleted!
Chimp
Human
conserved
Define a “dramatic” (non-neutral) genomic scenario:
hCONDEL
[McLean, Pollen, Reno et al, 2011]
Problem #2
:
What is the phenotype?
Slide12Testing is a humbling experiencehttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
12
“Wild rides”: often not what we expected, often not what we can understand.Did we have the right timepoint?Did we find the right processes?
Slide13What about a tree of related species?http://cs273a.stanford.edu [Bejerano Winter 2018/19]
13
What if we could find evolutionary patterns that were distinct enough to be phenotypically revealing?ancestorSpecies A
Species H
Genomes:
Inherited with Modifications.
Traits:
Come and Go.
Species B
.
.
.
Slide14http://cs273a.stanford.edu [Bejerano Winter 2018/19]
14Fixation, Positive & Negative Selection
Neutral Drift
Positive Selection
Negative Selection
Time
Slide15ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
What happens when an ancestral trait “
goes
”?
Phenotype
Genome
15
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Slide16ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
Phenotype
Genome
A lot of DNA and many traits
vary between any
two
species.
16
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Slide17ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
Phenotype
Genome
17
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
A lot of DNA and many traits
vary between any
two
species.
What about
independent
trait loss?
vitamin C synthesis, tail, body hair,
dentition features, etc. etc.
Slide18ancestral trait information
Trait information is no longer under selection
Erodes away over evolutionary time
ancestor
Phenotype
Genome
18
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Slide19http://cs273a.stanford.edu [Bejerano Winter 2018/19]
matches trait presence/absence pattern
The P
G screen
[Hiller et al., 2012a]
19
Slide20The PG screenhttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
20
Capture the independent genomic switch from purifying selection neutral evolution in all and only the trait loss species.Robust to: Different trait disabling times.Different trait disabling mutations.
Slide21Forward Genetics:Search for mutations that segregate with a trait of interestForward Genomics:Search for regions that are lost only in species lacking the trait
phenotype
genotype21http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Branding ;-)
But does it work?
Slide22Vitamin C Synthesis
synthesize vitamin C
cannot synthesize vitamin C
rats & mice
human
22
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Slide23vitamin C synthesis was lost3-4 times independently
in mammalian evolution
23http://cs273a.stanford.edu [Bejerano Winter 2018/19]The Vitamin C synthesis “phenotree”Fwd Genomics asks:Do one or moregenomic locilook like THAT?
Slide24We quantify divergence by comparing sequences to the reconstructed ancestral sequencereconstruct ancestral sequence ancestor
24species 1
outgroupspecies 2ACCCTATCGATT-CAACCCTATCGATTGCAT
CCGTATCG-TT-CA
species 1
species 2
14 identical bases
11 identical bases
Mutation in
species 1 or 2?
species 1
species 2
93%
79%
percent of identical bases:
more diverged
Insertion in species 1 or
deletion in species 2 ?
A
CC
C
TATCG
A
TT
G
CA
T
CC
G
TATCG
-
TT
-
CA
ACTCT-TCGATT-AA
Slide25Sequencing errors mimic divergence25high sequencing error rate
treat species 2 as missing datasequence quality scoresancestor
ACCCTATCGATT-CAATGGACCCTATCGATTGCAAGGGspecies 1species 289% identical bases61% identical bases
T
CC
G
TA
A
CG
--
T-C
T
AT
C
G
Slide26Assembly gaps mimic divergence26
?????????
species 1Sanger readsassembly gap
conserved region
treat species 1 as missing data
species 2
species 3
species 4
species 5
Slide27...Reconstruct the evolutionary history of all conserved regions, coding and non-coding
85%
70%93%
matrix:
33
species
x
544,549
regions
544,549 conserved regions
Reconstruct ancestral sequence
Measure extant species divergence
Avoid
Low quality sequence
Assembly gaps
Seek perfect
phenotree
match
27
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
reconstruct
ancestral
locus
Slide28We quantify the match to the vitamin C pattern by counting the number of species that violate the patternPercent identity
0
100Percent identity0100
1 violation
2 violations
28
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Slide298Regions matching the vitamin C trait are clustered
these conserved regions are all exons of a single gene544,549 conserved regions
no. of violating species0
1
2
3
4
5
7
9
10
6
no
match
perfect
match
29
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
Slide30This gene is more diverged in all non-vitamin C synthesizing specieshttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
30
Slide31What is the function of this gene ?http://cs273a.stanford.edu [Bejerano Winter 2018/19]
31
encodes the enzyme responsible for vitamin C biosynthesisVitamin C pattern
Gulo
- gulonolactone (L-) oxidase
33 genomes X 544,549 regions
Note:
No likely shared disabling mutation.
We learned about
both
evolution and function.
Slide32The Power of Forward Genomicshttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
32
Vitamin C pattern
Gulo
- gulonolactone (L-) oxidase
33 genomes X 544,549 regions
Forward genomics works.
Can it work for continuous traits?
With only two independent losses?
And many unknown values?
Slide33BileBile is a fluid produced by the liver that aids the digestion of lipids in the small intestine.http://cs273a.stanford.edu [Bejerano Winter 2018/19]
33
Slide34Bile Phospholipidshttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
34
Different mammals have remarkably different levels of biliary phospholipids:
Slide35ABCB4 is a phospholipid transporterhttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
35
Slide36Find “Cure” Models for Human Diseasehttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
36
Human ABCB4 mutations lower patient biliary phospholipid levels to guinea pig levels but are detrimental. Our discovery: Guinea pig and horse have inactivated the Abcb4 gene in their natural state. How can they do it?create KO gene
try to fix/treat
Natural KO
find nature’s cure!
Slide37“Reverse Genomics” of Enhancershttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
37
Slide38“Reverse Genomics” of Enhancers
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
38(Marcovitz et al, MBE, 2016)
Slide39“Reverse Genomics” of Enhancers
http://cs273a.stanford.edu [Bejerano Winter 2018/19]
39(Marcovitz et al, MBE, 2016)
Slide40We uncover many enhancer-trait correlationshttp://cs273a.stanford.edu [Bejerano Winter 2018/19]
40