ov chains Assume a gene that has three alleles A B and C These can mutate into each other Transition probabilities Transition matrix Probability matrix Left probability matrix The column sums add to 1 ID: 159138
Download Presentation The PPT/PDF document "Mar k" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Markov chains
Assume a gene that has three alleles A, B, and C. These can mutate into each other.
Transition probabilities
Transition matrix
Probability matrix
Left probability matrix: The column sums add to 1.
Right probability matrix: The row sums add to 1.
Transition matrices are always square
The trace contains the probabilities of no change.
A
B
C
A
B
C
68% of A
stays A, 12% mutates into B and 20% into C.7% mutates from B to A and 10% from C to A.Slide2
Calculating probabilities
Probabilities to reach
another state in the
next step.
Probabilities
to reach another state in exactly
two steps.
The
probability to reach
any state in exactly
n steps is given
bySlide3
Assume for instance you have a virus with N strains. Assume further that at each generation a strain mutates to another strain with probabilities ai→j. The probability to stay is therefore 1-Σa
i→j. What is the probability that the virus is after k generations the same as at the beginning? Slide4
Initial allele frequencies
Allele frequencies in the first generation
Given
initial allele frequencies.
What are the frequencies in
the next generation?Slide5
A Markov chain is a process where step n depends only on the transition probabilities at step n-1 and the realized values at step n.A
Marcov chain doesn’t have a memory.
Andrey
Markov (1856-1922)
Transition probabilities might change
.
The
model
assumes
constant t
ransition probabilities.Slide6
Does our mutation process above reach in stable allele frequencies or do they change forever?
Do we
get stable frequencies?
X
n is a
steady-state, stationary probability,
or equilibrium vector.The associated eigenvalue is
1.
The
equilibrium vector
is independent of the initial
conditions. The largest
eigenvalue (principal eigenvalue
) of every probability matrix
equals 1 and there is
an associated stationary probability vector
that defines the
equilibrium conditions (Perron-Frobenius theorem).Slide7
Eigenvalues
and eigenvectors of probability matrices
Column sums of probability
matrices are 1.Row
sums might be higher.
The eigenvalues of probability
matrices and their transposes
are identical.
One of the eigenvalues of a
probability matrix is 1.
If
one of
the
entries
of P is 1, the matrix
is called absorbing
.In this case
the eigenvector of the largest eigenvalue contains only zeros and one 1.
Absorbing
chains
become
monodominant
by one element.
To
get
frequencies
the
eigenvector
has
to be
rescaled
(
normalized
).Slide8
Normalizing the stationary state vector
Frequencies
have to
add to unity!
Stationary frequenciesSlide9
Final
frequencies
The
sum of the eigenvector entries
have to be rescaled.
N=1000Slide10
Do all Markov chains converge?
Closed
part
Recurrent part
Periodic
chain
R
ecurrent and aperiodic chains
are called
ergodic.
The probability matrix theorem tells that every irreducible ergodic transition matrix has a steady state vector T to which the process converges.
You can leave
every state.
State D cannot be left.The
chain is absorbing. Slide11
Absorbing chains
A
C
D
B
It
is
impossible
to
leave
state D
A chain
is called absorbing
if it containes
states without exit.
The other states are called transient.Any absorbing Markov chain finally
converges to the absorbing
states
.
Closed
part
Absorbing
partSlide12
The time to reach the absorbing state
Home
Bar
Assume
a
druncard
going
randomly
through
five
streets
. In
the
first street
is his home, in the last a bar.
At either home
or
bar
he
stays
.
0.5
0.5
0.5
0.5
0.5
0.5Slide13
The
canonical
form
We
rearrange
the transition matrix
to have the s
absorbing states in
the upper left
corner and the t transient
states in the
lower right corner.
We have four compartments
After n steps we have;
The unknown matrix contains information about
the frequencies to reach
an
absorbing
state
from
stateB
, C,
or
D.
Transient
partSlide14
Multiplication
of
probabilities
gives ever smaller values
Simple geometric series
The
entries n
ijof the matrix
B contain the
probabilities of ending in an
absorbing state i when started
in state j.
The
entries n
ijof the fundamental
matrix N of Q
contain the expected numbers of time the process is in state i when started in state j. Slide15
The
sum of all rows of N gives the
expected number of times
the chain is
is state i (afterwards it falls to
the absorbing state).t is a
column vector that gives
the expected number of steps
(starting at state i) before
the chain is
absorbed.
The
druncard’s
walk
The
expected number of steps to reach the absorbing state.
The
probability
of
reaching
the
absorbing
state
from
any
of
the
transient
states
.Slide16
Periodic
chains
do not
have stable points. Slide17
Expected return (recurrence) times
C
A
D
E
B
If
we start
at
state D,
how
long
does
it
take
on
average
to return to D?
If
u
is
the
rescaled
eigenvector
of
the
probability
matrix
P
,
the
expected
return time
t
ii
of state i back to i
is
given
by
the
inverse
of
the
i
th
element
u
i
of
the
eigenvector
u
.
The
rescaled
eigenvector
u of
the
probability
matrix
P
gives
the
steady
state
frequencies
to be
in
state i.
0.33
0.33
0.25
0.25
0.05
0.05
0.15
0.25
0.50
0.35
In
the
long run
it
takes
about
9
steps
to return to DSlide18
First passage times in ergodic
chainsIf we start at
state D, how long does it
take on average to reach
state A?
C
A
D
E
B
0.33
0.33
0.25
0.25
0.05
0.05
0.15
0.25
0.50
0.35
Applied to
the
original
probability
matrix
P
the
fundamental
matrix
N
of
P
contains
information
on
expected
number
of
times
the
process
is
in
state i
when
started
in
state j.
D
C
A
D
E
B
D
E
B
A
C
A
0.25
0.05
0.25
0.33
0.15
0.25
0.33
0.35
0.05
0.0125
0.012375
0.00144375
We
have
to
consider
all
possible
ways
from
D to A.
The
inverse
of
the
sum of
these
probabilities
gives
the
expected
number
of
steps
to
reach
from
point j to point k.
The
fundamental
matrix
of an
ergodic
chain
D
E
D
C
A
……
0.25
0.33
0.25
0.05
0.00103125
W
is
the
matrix
containing
only
the
rescaled
stationary
point
vector
.
The
expected
average
number
of
steps
t
jk
to
reach
from
j to k
comes
from
the
entries
of
the
fundamental
matrix
N
divided
through
the
respective
entry
of
the
(
rescaled
)
stationary
point
vector
.Slide19
Average
first
passage
timeSlide20
You
have sunny, cloudy, and rainy days
with respective transition
probabilities. How long does
it take for a sunny day to folow a rainy
day? How long does it
take that a sunny day
comes back? Slide21
Probabilities
of DNA substitutionWe assume equal
substitution probabilities. If
the total probability for a
substitution is p:A
T
CG
p
p
p
p
p
The
probability
that
A mutates to T, C, or G
isP¬A
=p+p+p
The
probability
of no
mutation
is
p
A
=1-3p
Independent
events
Independent
events
The
probability
that
A
mutates
to T and C to G
is
P
AC
=(p)x(p)
p(
A
→T
)+
p(
A
→C
)
+p(
A
→G
)
+p(
A
→A
)
=1
The
construction of
evolutionary
trees
from
DNA
sequence
data Slide22
The
probability matrix
A
T
CG
A
T
CG
What is
the probability that
after 5 generations A did not
change?
The
Jukes - Cantor model (JC69)
now assumes that all
substitution probabilities are
equal.Slide23
Arrhenius model
The Jukes Cantor model assumes equal substitution probabilities within these 4
nucleotides.
Substitution probability after
time t
Transition matrix
Substitution
matrix
t
A,T,G,C
A
The
probability that nothing
changes is the zero term of
the Poisson distribution
The probability of at least one substitution is
The
probability
to
reach
a
nucleotide
from
any
other
is
The
probability
that
a
nucleotide
doesn’t
change
after
time t
isSlide24
Probability for a single difference
This is the mean time to get x different sites from a sequence of n
nucleotides. It is also a measure of distance that dependents only on the number of substitutions
What
is the probability
of n differences after time t?
We
use
the
principle of maximum
likelihood and the Bernoulli distributionSlide25
Gorilla
Pan
paniscus
Pan
troglodytes
Homo sapiens
Homo
neandertalensis
Time
Divergence
-
number
of
substitutions
Phylogenetic
trees
are the basis of
any systematic classificaton