Xuhua Xia xxiauottawaca http dambebiouottawaca Xuhua Xia Transition bias refers to the degree by which the sv ratio deviates from the expected 12 The observed sv ratio is almost always much larger than 12 ID: 597329
Download Presentation The PPT/PDF document "Transition Bias and Substitution models" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Transition Bias and Substitution models
Xuhua Xia
xxia@uottawa.ca
http://
dambe.bio.uottawa.caSlide2
Xuhua Xia
Transition bias refers to the degree by which the s/v ratio deviates from the expected 1/2. The observed s/v ratio is almost always much larger than 1/2.
A G
C T
A G
C T
A G
C T
Transitions and Transversions
Transition: t
he substitution of a purine for a purine or a pyrimidine for a pyrimidine. Symbolized by s.
Transversion: t
he substitution of a purine for a pyrimidine or vice versa. Symbolized by v.
What is transition bias?
Purine
PyrimidineSlide3
Xuhua Xia
Transition Bias is Ubiquitous. Why?
For both invertebrate and vertebrate genes:What causes transition bias?Mutation biasSelection bias
Selection bias in fixation probability
Protein-coding genes
RNA genes
Mutation biasSlide4
Xuhua Xia
Mitochondrial Genetic Code
Synonymous and nonsynonymous
Degeneracy:
Non-degenerate
Two-fold degenerate
Four-fold degenerate
Transitions are synonymous and transversions are nonsynonymous at two-fold degenerate sites.Slide5
Xuhua Xia
RNA secondary structure
CCAAU
CCAAU
Seq1: CA
C
GA
|||||
GUGCU
Seq2: CA
U
GA ||||| GUGCU
CCAAU
CCAAU
Seq1: C
A
CGA
|||||
GUGCUSeq2: C
GCGA ||||| GUGCU
G/U pair, although not as strong as A/U or C/G pair, generally does not disrupt RNA secondary structure (and occurs frequently in RNA secondary structure).Slide6
Xuhua Xia
Causes of transition bias
I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be."
Lord Kelvin: Phys. Letter A, vol. 1, "Electrical Units of Measurement", 1883-05-03Slide7
Xuhua Xia
At Four-fold Degenerate Sites
At four-fold degenerate sites, all nucleotide substitutions are synonymous and subject to roughly the same selection pressure (similar fixation probabilities)
Glycine codon:
GGA
GGC
GGG
GGT
Four-fold
degenerate site
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
Fold 4 2 2 2 2 4 4 4 2
S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
s s v
Glu Gly TrpSlide8
Xuhua Xia
At Nondegenerate Sites
Glycine codon:
GGAGGCGGGGGT
nondegenerate site
At nondegenerate sites, all nucleotide substitutions are nonsynonymous and subject to roughly the same selection pressure (similar fixation probabilities)
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
s v
Glu Gly TrpSlide9
Xuhua Xia
At Two-fold Degenerate Sites
At two-fold degenerate sites, all transitional substitutions are synonymous, and all transversional substitutions are nonsynonymous
GAA His
GAG His
GAC Gln
GAT Gln
2-fold
degenerate site
A transition is about 40 time as like to become fixed as a transversion.
Gly Asn Lys Gly Asp Lys Ala Ala Pro Ala Cys ...
Fold 4 2 2 2 2 4 4 4 2
S1 GGA AAU AAA GGA GAC AAA GCC GCC CCU GCG UGU ...
S2
GGG AAC AAA GAA GAU AAG GCC GCU CCA GGG UGG ...
s s s v
Glu Gly TrpSlide10
Xuhua Xia
Methylation and deamination
H
3C-
Methyltransferase
H
3
C- +
Donor AcceptorSlide11
Xuhua Xia
Methylation and DNA Repair in E. coli
DNA alphabets: ACGTRNA alphabets: ACGUDNA duplication and Watson-Crick paring rule: A-T, C-G
3’--CTAG-
-
--CT
A
GGTAT----C-----C--CT
AG-----------5’
|||| |||||||| ? ? ||||5’--GATC----GATCCATA----U-----T--GATC-----... 3’
H
3C H3C H3C
H
3C
mutS
mutH
mutL
Spacing of GATC: consequences of being too far.Slide12
Xuhua Xia
Methylation-Modification System
TGGC*CA
AC*CGGT
Brevibacterium albidum
dsDNA
phage
Bacterial
Genome
Restriction
enzyme
Transcription
and Translation
Bacterial Membrane
----TGG|CCA---
----ACC|GGT---
MethylaseSlide13
Xuhua Xia
CpG-Specific DNA Methylation
Mammalian DNA methyltransferase 1 (DNMT1)NLS-containing domainreplication foci-directing domainZnD, Zn-binding domain
polybromo domainCatD, the catalytic domain
Fatemi, M., A. Hermann, S. Pradhan and A. Jeltsch, 2001 J Mol Biol 309: 1189-99.
1
343
350
613
746
1124
609
748
1110
NlsD
ZnD
CatD
CpG mCpG
m
CpG
RFDD
PBD
1620Slide14
Xuhua Xia
CpG-Specific DNA Methylation
5’ATG
CGA-------CCGA--------ACGGC--TAA 3’
|||||| |||| |||||
3’TACG
C
T-------GG
CT--------TGC
CG--ATT 5’H3C
H
3
C
H3
C
Fully methylated Hemi-methylated Unmethylated
Note: 5’CG3’ = CpGSlide15
Xuhua Xia
Methylation and Gene Regulation
Proteins with a methyl-CpG binding domain (MBD)MBD1, MBD2, and MBD3 MeCP2Deacetylases: An enzyme that removes an acetyl group
Histone deacetylases: deacetylate lysyl residues in histones (the half life of an acetyl group is ~10min). Acetylation removes a positive charge on the lysine -amino group and promote nucleosome melting (and gene expression). Deacetylation tend to decrease or turn off gene expression.
---
m
CpG-----------------
MBD
Histone deacetylase
Condensed DNA with repressed transcription
Wade, P. A., and A. P. Wolffe, 2001 Nat Struct Biol 8: 575-7.
Lysine demethylationSlide16
Xuhua Xia
Slide
16
H
3
C
Methylation and Mutation
N
N
O
NH
2
O
Cytocine is converted to Thymine
methylation
Spontaneous deamination
N
N
O
H
3
C
OSlide17
Xuhua Xia
Vertebrate mitochondrionSlide18
Xuhua Xia
Spontaneous deaminationSlide19
Xuhua Xia
Transversion can erase transitions
Transitions can erase transitions, and transversions can erase transversions.
However, a transversion can erase many transitions occurring before it, and subsequent transitions cannot erase the transversion:AACGCTTGACGAACGCTTAACGAACGCTTGACGAACGCTT
C
ACG
AACGCTT
T
ACGAlthough a transition could also erase 2n transversions occurring before it, this is rare because transversions are in generally much rarer than transitions. Transitions tend to be missed in counting much more frequently than transversions.
AACGCTT
G
ACG
AACGCTT
T
ACGAACGCTTAACGAACGCTTGACGSlide20
Xuhua Xia
Summary
Selection: Transitions are tolerated more than transversion by natural selection becausethey are more likely synonymous in protein-coding sequences than transversionsthey are less likely to disrupt RNA secondary structure than transversions.Mutation: Transitional mutation occurs more frequently than transversions becauseMisincorporation during DNA replication occur more frequently between two purines or between two pyrimidines than between a purine and a pyrimidineA purine is more likely to mutate chemically to another purine than to a pyrimidine (e.g., through spontaneous deamination) . The same for pyrimidine.
Bias in counting: Transitions tend to be missed in counting much more frequently than transversions (which necessitates the substitution models)Slide21
Xuhua Xia
Nucleotide Substitutions
ACACTCGGATTAGGCT
ACACTCGGATTAGGCT
A
T
ACTC
A
GGTTAAGCTACAA
TC
CGGTTAAGCT
T C C
AGACTCGGATTAGGCT
Observed sequences
single
multiple
coincidental
parallel
convergent
back
Actual number of changes during the evolution of the two daughter sequences: 12
Observed number of differences between the two daughter sequences: 3.
Correcting for multiple substitutions to to estimate the true number of changes, i.e., 12.
From WHLSlide22
Xuhua Xia
Substitution models and phylogenetics
A substitution model is to model the evolutonary process so as to correct for multiple hits.A phylogenetic reconstruction method implicitly or explicitly assumes a substitution model.A phylogenetic method assuming a wrong substitution model will typically lead to wrong trees produced.An alignment with an inappropriate substitution score matrix will typically lead to inaccurate alignment (e.g., strong transition bias among sequences but a substitution score matrix without strong penalty against transversion)
A G
C T Slide23
A G C T
A a
1
a
2
a
3
G a
7
a
4 a5
C a8 a9 a6
T a
10 a11 a12
A G C T
A a
1
G
a
2
C
a
3
T G a
1A a4
C
a5
T C a2A
a
4G a6
T
T a
3
A
a
5
G
a
6
C
The diagonal of a transition probability matrix is subject to the constraint that each row sums up to 1.
JC69
i
= 0.25
a
i
= c
F81/TN84
A
,
C
,
G
,
T
a
i
= c
K80
i
=0.25
a
1
= a
6
= a
7
= a
12
=
a
2
= a
3
= a
4
= a
5
= a
8
= a
9
= a
10
= a
11
=
HKY85
A
,
C
,
G
,
T
a
1
= a
6
= a
7
= a
12
=
a
2
= a
3
= a
4
= a
5
= a
8
= a
9
= a
10
= a
11
=
TN93
A
,
C
,
G
,
T
a
1
= a
7
=
1
a
6
= a
12
=
2
a
2
= a
3
= a
4
= a
5
= a
8
= a
9
= a
10
=a
11
=
GTR
Unrestricted: no equilibrium
iSlide24
Xuhua Xia
The TN93 model as an example
- frequency parameters
- rate ratio parameters
In addition to illustrated assumptions, it also assumes that the frequency and rate ratio parameters do not change over time, i.e., the substitution process is stationary.
A G
C T
T C A GSlide25
Xuhua Xia
Substitution Models
There are three types of substitution models in molecular evolutionNucleotide-basedAmino acid-basedCodon-basedSubstitution models are characterized by two categories of parameters: the frequency parameters and the rate ratio parameters, and different models differ by their assumptions concerning these two categories of parameters.Substitution models, substitution score matrix and sequence alignment.