/
Lecture 10 – Models of DNA Sequence Evolution Lecture 10 – Models of DNA Sequence Evolution

Lecture 10 – Models of DNA Sequence Evolution - PowerPoint Presentation

jordyn
jordyn . @jordyn
Follow
66 views
Uploaded On 2023-10-28

Lecture 10 – Models of DNA Sequence Evolution - PPT Presentation

Correct for multiple substitutions in calculating pairwise genetic distances Derive transformation probabilities for likelihoodbased methods Prob R r t p m x P mk v ID: 1026053

site time rate probability time site probability rate model substitution frequencies base state substitutions jukes cantor set process equal

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lecture 10 – Models of DNA Sequence Ev..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Lecture 10 – Models of DNA Sequence EvolutionCorrect for multiple substitutions in calculating pairwise genetic distances.Derive transformation probabilities for likelihood-based methods.Prob(Rr | t ) = pm x Pm,k(v3,1) x Pk,A(v1,w) x Pk,G(v1,x) x Pm,l(v3,2) x Pl,C(v2,y) x Pl,C(v2,z)It’s the Pi,j’s that we need a substitution model to calculate.The models typically used are Markov models.Poisson process is a stochastic process that can be used to model events in time. The time between events is exponentially distributed, with rate l.

2. Jukes-Cantor ModelThe probability of a site appearing constant is: pii(t) = ¼ + ¾ e-4atThe probability of a site changing is : pij(t) = ¼ - ¼ e-4at is the rate at which any nucleotide changes to any other per unit time.(analogous to the Poisson rate l)Given that the state at the site is (say) A at t0, we start by estimating the probability of state A at that site at t1.pA(0) = 1pA(1) = 1 – 3a

3. Now, what’s the probability of this site having an A at t2 There are two ways for the site to have state A at t2:1 – It still hasn’t changed since time t0. 2 – It has changed to something else and back again. Therefore,pA(2) = (1 – 3a) pA(1) + a [1 – pA(1)], where(1 – 3a) pA(1) = probability of no change at the site during time t2, (1 - 3a), times the probability of the site having state A at time t1, (pA(1)). anda [1 - pA(1)] = probability of a change to A (a ), times the probability that the site is not A at time t1, (1 - pA(1)) Jukes-Cantor Model

4. Jukes-Cantor ModelWe have a recurrence equation. pA(t+1) = (1 – 3a) pA(t) + a [1 – pA(t)]We can calculate the change in pA(t) across time, Dt. pA(t+1) – pA(t) = -3apA(t) + a – apA(t) soand= pA(t) – 3apA(t) + a – apA(t)

5. Jukes-Cantor Modelpi(t) = 1/4 + (pi(0) – 1/4) e -4atWe have a probability that a site has a particular nucleotide after time t, given in terms of its initial state. If i = j, pi(0) = 1. Therefore, pii(t) = 1/4 + 3/4 e -4atIf i not = j, pi(0) = 0, a is an instantaneous rate, so we’ve modeled branch length (rate x time) explicitly in our expectations.Generalize and pij(t) = 1/4 - 1/4 e -4at

6. The JC model makes several assumptions.1) All substitutions are equally likely; we have a single substitution type (nst = 1).2) Base frequencies are assumed to be equal; each of the four nucleotides occurs at 25% of sites (ba = eq).3) Each site has the same probability of experiencing a substitution as any other; we have an equal-rates model (ra = eq).4) The process is constant through time.5) Sites are independent of each other. 6) Substitution is a Markov process. -3a a a a a -3a a a Q = a a -3a a  a a a -3aQ - matrix

7. Substitution types and base frequencies. -m(apC + bpG + cpT) mapC mbpG mcpT  mgpA -m(gpA + dpG - epT) mdpG mepTQ = mhpA mjpC -m(hpA + jpC + fpT) mfpT  mipA mkpC mlpG -m(ipA + kpC + lpG)For the general case:where, m = the average instantaneous substitution rate,a, b, c, …, l are relative rate parameters (one of them is set to 1).and pi’s are the frequencies of the base that is being substituted to. Note that this is not symmetric, and therefore, the full model is non-reversible.a = g, b = h, c = i, d = j, e = k, & f = l.

8. Substitution types and base frequencies. -m(apC + bpG + cpT) mapC mbpG mcpT  mapA -m(apA + dpG + epT) mdpG mepTQ = mbpA mdpC -m(bpA + dpC + fpT) mfpT  mcpA mepC mfpG -m(cpA + epC + fpG)General Time-Reversible ModelThere are six relative transformation rates (one of which is set to 1).There are four base frequencies that must sum to 1.Note that this is not a symmetric matrix, but it can be decomposed into R and P.

9. Substitution types and base frequencies. -m(a+b+c) ma mb mc  ma -m(a+d+e) md meR = mb md -m(b+d+f) mf  mc me mf -m(c+e+f) pA 0 0 0  0 pC 0 0P = 0 0 pG 0  0 0 0 pTVisual GTRFive free relative-rate parameters (a, b, c, d, & e; f is set to 1).Three free base frequencies (the 4 must sum to 1)

10. Common Simplifications Transition type substitutions occur at a higher rate than transversion substitutions. K2P Model was the first to address this. So, we set b = e = k (for transitions), and a = c = d = f = 1 (for transversions) . -(m)(k + 2)/4 m/4 mk/4 m/4   m/4 -(m)(k + 2)/4 m/4 mk/4 for K2P: Q = mk/4 m/4 -(m)(k + 2)/4 m/4  m/4 mk/4 m/4 -(m)(k + 2)/4 All pi = ¼where a = mk/4 and b = m/4. Thus, k = a / b and

11. Hasegawa-Kishino-Yano (HKY) Model -m(kpG + pY) mpC mkpG mpT   mpA -m(kpT + pR) mpG mkp for HKY: Q = mkpA mpC -m(kpA + pY) mpT  mpA mkpC mpG -m(kpC + pR)where a = mk, b = m, pR = pA + pG, and pY = pC + pT.There are lots of other models that restrict the Q-matrix.

12. Some common modelsThere are 203 special cases of the GTR, 406 if we allow for equal base frequencies.all tv equalall ti equal

13. Calculating Transformation Probabilities.So, the Q & R matrices we’ve been discussing define the instantaneous rates of substitutions from one nucleotide to another. Convert the rates to probabilities by matrix exponentiation: P(t) = e QtJukes-CantorK2PAgain, it’s these Pij that are used in the likelihood function.