Algorithms for Hidden Markov Models Prof Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI Resources used for these slides Durbin Eddy Krogh and ID: 742354
Download Presentation The PPT/PDF document "Viterbi, Forward, and Backward" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Viterbi, Forward, and BackwardAlgorithms for Hidden Markov Models
Prof. Carolina RuizComputer Science DepartmentBioinformatics and Computational Biology ProgramWPISlide2
Resources used for these slides
Durbin, Eddy, Krogh, and Mitchison. "Biological Sequence Analysis". Cambridge University Press. 1998. Sections 3.1-3.3.
Prof
. Moran's Algorithms in Computational Biology course (
Technion
Univ.)
:
Ydo
Wexler & Dan Geiger's Markov Chain Tutorial.
Hidden Markov Models (HMMs) Tutorial.
Slide3
HMM: Coke/Pepsi Example
start
B
R
A
C
P
C
P
Hidden States
:
start
: fake start state
A
: The price of Coke and Pepsi are the same
R
: “Red sale”: Coke is on sale (cheaper than Pepsi)
B
: “Blue sale”: Pepsi is on sale (cheaper than Coke)
Emissions
:
C
: Coke
P
: Pepsi
0.6
0.1
0.3
0.2
0.4
0.7
0.1
0.1
0.3
0.8
0.3
0.1
C
P
0.6
0.4
0.5
0.5
0.1
0.9Slide4
1. Finding the most likely trajectory
Given a HMM and a sequence of observables: x1,x2,…,xL
determine the most likely sequence of states that generated
x
1
,x
2
,…,
xL
: S* = (s*1,s*2,…,s*L)
= argmax p( s1,s2,…,s
L| x1,x2,…,x
L ) s1,s
2,…,sL =
argmax p( s1,s2,…,sL;
x1,x2,…,xL)/p(x1
,x2,…,xL) s
1,s
2,…,sL = argmax
p( s1,s2,…,sL; x
1,x2,…,xL )
s1,s2,…,sLSlide5
= argmax p( s
1,s2,…,sL; x1,x2,…,x
L
)
s
1
,s
2,…,
sL= argmax p(s1,s2,…,
sL-1; x1,x2,…,xL-1)p(s
L|sL-1)p(xL|sL
) s1,s2,…,
sLThis inspires a recursive formulation of S*.
Viterbi’s idea: This can be calculated using dynamic programming.v(
k,t) =
max p(s1
,..,st
= k ; x
1,..,x
t)
that is, the probability of a most probable path up to time t that ends on state k
. By the above derivation:v(k,t
) = max p(s1,..,s
t-1; x1,..,
xt-1)p(st=k|s
t-1)p(xt|st=k)
= max v(j,t-1)p(st=
k|sj)p(xt|st=k)
j = p(xt|st=k) max v(j,t-1)p(st=k|sj)
j Slide6
Viterbi’s Algorithm - Example
v
x
1
= C
x
2
= P
x
3 = C
start1
000
A0
R
0
B
0
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the most likely path
S
*= (s*1,s*2
,s*3) that generated x1,x2,x3
= CPCinitializationSlide7
Viterbi’s Algorithm - Example
v
x
1
= C
x
2
= P
x
3 = C
start1
00
0A
0p(xt|st
=k) max j v(j,t-1)p(s
t|sj)
= p(C|A) max {v(start,0)p(
A|start), 0, 0, 0}
= p(C|A) v(start,0)p(A|start) = 0.6
*1*0.6 = 0.36Parent: start
R
0
p(C|R) max {
v(start,0)p(
R|start), 0, 0, 0}= 0.9*1*0.1 = 0.09
Parent: start
B
0
p(C|B) max {v(start,0)p(B|start
), 0, 0, 0}= 0.5*1*0.3 = 0.15
Parent: start
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the most likely path
S*= (s*1
,s*2,s*3) that generated x1
,x2,x3= CPCSlide8
Viterbi’s Algorithm - Example
v
x
1
= C
x
2
= P
x
3=C
start1
000
A0
0.36Parent: start
=
p(
xt
|s
t=k)
max
j v(j,t-1)
p(
st
|sj
)
= p(P|A) max {
v(start,1)p(
A|start),
v(A,1)p(A|A),
v(R,1)
p(A|R),
v(B,1)
p(A|B)}
= 0.4* max{0,
0.36*0.2
, 0.09*0.1, 0.15*0.4} = 0.4*0.072= 0.0288
Parent: A
R
0
0.09
Parent:
start
= p(x
t|s
t
=k) max
j
v(j,t-1)p(
s
t|s
j
)
= p(P|R) max {
v(start,1)p(
R|start
),
v(A,1)
p(R|A),
v(R,1)
p(R|R),
v(B,1)
p(R|B)}
= 0.1* max{0, 0.36*0.1, 0.09*0.1,
0.15*0.3
} = 0.1*0.045= 0.0045
Parent: B
B
00.15Parent: start= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(P|B) max {v(start,1)p(B|start), v(A,1)p(B|A), v(R,1)p(B|R), v(B,1)p(B|B)}= 0.5* max{0, 0.36*0.7, 0.09*0.8, 0.15*0.3} = 0.5*0.252= 0.126Parent: A
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the most likely path
S
*= (s*
1
,s*
2
,s*
3
) that generated x
1
,x
2
,x
3
= CPCSlide9
Viterbi’s Algorithm - Example
v
x
1
= C
x
2
= P
x
3=C
start1
000
A0
0.36Parent: start
0.0288
Parent: A
= p(
x
t
|st
=k)
max j
v(j,t-1)
p(s
t|s
j)
= p(C|A) max {
v(start,2)
p(A|start),
v(A,2)p(A|A),
v(R,2)
p(A|R),
v(B,2)
p(A|B)}
= 0.6* max{0, 0.0288*0.2, 0.0045*0.1,
0.126*0.4
} = 0.6*0.0504= 0.03024
Parent: B
R
0
0.09
Parent:
start
0.0045
Parent: B
= p(
x
t
|s
t
=k) max
j
v(j,t-1)p(
s
t
|s
j
)
= p(C|R) max {
v(start,2)
p(
R|start
),
v(A,2)
p(R|A),
v(R,2)
p(R|R), v(B,2)p(R|B)}= 0.9* max{0, 0.0288*0.1, 0.0045*0.1, 0.126*0.3} = 0.9*0.0378= 0.03402Parent: BB00.15Parent: start0.126Parent: A= p(xt|st=k) max j v(j,t-1)p(st|sj)= p(C|B) max {v(start,1)p(B|start
),
v(A,2)
p(B|A), v(R,2)p(B|R), v(B,2)p(B|B)}= 0.5* max{0, 0.0288*0.7, 0.0045*0.8, 0.126*0.3} = 0.5*0.0378= 0.0189Parent: B
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the most likely path
S
*= (s*
1
,s*
2
,s*
3
) that generated x
1
,x
2
,x
3
= CPCSlide10
Viterbi’s Algorithm - Example
v
x
1
= C
x
2
= P
x
3=C
start1
000
A0
0.36Parent: start
0.0288
Parent: A
0.03024
Parent: B
R
0
0.09
Parent: start
0.0045
Parent: B
0.03402
Parent: B
B
0
0.15Parent: start
0.126
Parent: A
0.0189
Parent: B
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the most likely path
S
*= (s*
1
,s*2,s*3
) that generated x1,x2,x3
= CPCHence, the most likely path that generated
CPC is: start A B RThis maximum likelihood path is extracted from the table as follows:
The last state of the path is the one with the highest value in the right-most columnThe previous state in the path is the one recorded as Parent of the last
Keep following the Parents trail backwards until you arrive at startSlide11
2. Calculating the probability of a sequence of observations
Given a HMM and a sequence of observations: x1,x2
,…,
x
L
determine
p(x
1,x2
,…,xL):
p(x1,x2,…,
xL) =
p( s1,s
2,…,sL; x1,x2,…,x
L) s1
,s2,…,sL
=
p(s1
,s2,…,sL-1; x
1,x2,…,xL-1
)p(sL|sL-1)p(
xL|sL)
s1,s2
,…,sLSlide12
Let f(
k,t) = p(s
t
= k ; x
1
,..,x
t
)
that is, the probability of x
1,..,xt
requiring st
= k. In other words, the sum of probabilities of all the paths
that emit (x1,..,
xt) and end in state s
t=k.
f(k,t) =
p(st
=
k ; x1,..,x
t, xt
)=
j p(st-1=j; x1,x
2,…,xt-1) p(s
t=k|st-1=j) p(x
t|st=k)
= p(xt|s
t=k) j
p(st-1=j; x1,x2,…,xt-1
) p(st=k|st-1=j)= p(xt|s
t=k) j f(j,t-1) p(s
t=k|st-1)
Slide13
Forward Algorithm - Example
f
x
1
= C
x
2
= P
x
3 = C
start1
000
A0
R
0
B
0
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the probability that the HMM
emits x
1,x2,x3
= CPC. That is, find p(CPC). initializationSlide14
Forward Algorithm - Example
f
x
1
= C
x
2
= P
x
3 = C
start1
00
0A
0p(xt|st
=k) j f(j,t-1)
p(st|sj)
= p(C|A) {
f(start,0)p(A|start
), 0, 0, 0}= p(C|A) f(start,0)p(A|start)
= 0.6 *1*0.6 = 0.36
R0
p(C|R)
{f(start,0)
p(
R|start), 0, 0, 0}= 0.9*1*0.1 = 0.09
B
0
p(C|B) {f(start,0)
p(B|start), 0, 0, 0}
= 0.5*1*0.3 = 0.15
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the probability that the HMM
emits x1,x2
,x3= CPC. That is, find p(CPC). Slide15
Forward Algorithm - Example
f
x
1
= C
x
2
= P
x
3=C
start1
000
A0
0.36
= p(
xt
|s
t
=k)
j f(j,t-1)
p(s
t
|sj
)
= p(P|A) (f(start,1)
p(
A|start),
+ f(A,1)p(A|A),
+ f(R,1)p(A|R),
+
f(B,1)
p(A|B))
= 0.4* (0 + 0.36*0.2 + 0.09*0.1 + 0.15*0.4) = 0.4*0.141= 0.0564
R
0
0.09
= p(
x
t
|s
t
=k)
j
f(j,t-1)
p(s
t|s
j
)
= p(P|R) (f(start,1)
p(
R|start) +
f(A,1)
p(R|A) +
f(R,1)
p(R|R) +
f(B,1)
p(R|B))
= 0.1* (0 + 0.36*0.1 + 0.09*0.1 + 0.15*0.3) = 0.1*0.09= 0.009
B
0
0.15
= p(
x
t
|s
t
=k) j f(j,t-1)p(st|sj)= p(P|B) (f(start,1)p(B|start) + f(A,1)p(B|A) + f(R,1)p(B|R) + f(B,1)p(B|B))= 0.5* (0 + 0.36*0.7 + 0.09*0.8 + 0.15*0.3) = 0.5*0.369= 0.1845 Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC). Slide16
Forward Algorithm - Example
f
x
1
= C
x
2
= P
x
3=C
start1
000
A0
0.36
0.0564
= p(
xt
|s
t
=k)
j f(j,t-1)
p(
st
|sj
)
= p(C|A)
{
f(start,2)p(A|start),
f(A,2)p(A|A),
f(R,2)
p(A|R),
f(B,2)
p(A|B)}
= 0.6* (0 + 0.0564*0.2 + 0.009*0.1 + 0.1845*0.4} = 0.6*0.08598= 0.05159
R
0
0.09
0.009
= p(
x
t
|s
t
=k)
j f(j,t-1)
p(
st
|s
j)
= p(C|R)
{
f(start,2)
p(
R|start
),
f(A,2)
p(R|A),
f(R,2)
p(R|R),
f(B,2)
p(R|B)}
= 0.9* (0 + 0.0564*0.1 + 0.009*0.1 + 0.1845*0.3} = 0.9*0.06189= 0.05570
B
00.150.1845= p(xt|st=k) j f(j,t-1)p(st|sj)= p(C|B) {f(start,1)p(B|start), f(A,2)p(B|A), f(R,2)p(B|R), f(B,2)p(B|B)}= 0.5* (0 + 0.0564*0.7 + 0.009*0.8 + 0.1845*0.3} = 0.5*0.10203= 0.05102Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC). Slide17
Forward Algorithm - Example
f
x
1
= C
x
2
= P
x
3=C
start1
000
A0
0.36
0.0564
0.05159
R
0
0.09
0.009
0.05570
B
0
0.15
0.1845
0.05102
Hence, the probability of CPC being generated by this HMM is:
p(CPC) = j f(j,3) = 0.05159 +
0.05570 + 0.05102 = 0.15831Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits x1,x2,x3= CPC. That is, find p(CPC). Slide18
3. Calculating the probability of St = k given a sequence of observations
Given a HMM and a sequence of observations: x1
,x
2
,…,
x
L
determine the probability that the state visited at time t was k: p(st=k|
x1,x2,…,x
L), where 1 <= t <= Lp(
st=k| x1,x
2,…,xL) =
p(x1,x2,…,
xL; st
=k)/p(x1,x
2,…,xL)
Note that
p(x1,x2,…,x
L) can be found using the forward algorithm. We’ll focus now on determining p(x
1,x2,…,x
L; st
=k)Slide19
p(x1
,…,xt,…,xL
;
s
t
=k
)
= p(x1,…,xt; s
t=k) p(xt+1,…,xL
| x1,…,
xt ;
st=k)= p(x1,…,
xt; st=k) p(x
t+1,…,xL|
st=k)
f(k,t) b(k,t
) forward algorithm backward algorithm
b(k,t) = p(x
t+1,…,xL|
st=k)=
j p(st+1
=j|st=k)p(xt+1
|st+1=j) p(xt+2,…,
xL| st+1=j)
b(j,t+1)Slide20
Backward Algorithm - Example
b
x
1
= C
x
2
= P
x
3 = CA
1
R
1
B
1
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the probability that the HMM emits xt+1,…,
xL
given that St=k: p(xt+1,…,x
L| st=k)
initializationSlide21
Backward Algorithm - Example
b
x
1
= C
x
2
= P
x
3 = CA
j p(st+1=
j|st=k) p(xt+1
|st+1=j) b(j,t+1)
= j
p(s3=j|A) p(C|s
3=j) b(j,3)
= p(A|A)p(C|A)
b(A,3) + p(R|A)p(C|R)b(R,3) + p(B|A)p(C|B)b(B,3)
= 0.2*0.6*1 + 0.1*0.9*1 + 0.7*0.5*1 = 0.561
R
j
p(st+1
=
j|st
=k) p(xt+1
|st+1=j) b(j,t+1)=
j
p(s3
=
j|R
) p(C|s
3
=j) b(j,3)
= p(A|R)p(C|A)b(A,3) + p(R|R)p(C|R)b(R,3) + p(B|R)p(C|B)b(B,3)
= 0.1*0.6*1 + 0.1*0.9*1 + 0.8*0.5*1 = 0.55
1
B
j
p(s
t+1
=j|s
t=k) p(x
t+1
|st+1
=j) b(j,t+1)
=
j
p(s
3
=
j|R
) p(C|s
3
=j) b(j,3)
= p(A|B)p(C|A)b(A,3) + p(R|B)p(C|R)b(R,3) + p(B|B)p(C|B)b(B,3)
= 0.4*0.6*1 + 0.3*0.9*1 + 0.3*0.5*1 = 0.66
1
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the probability that the HMM
emits
x
t+1,…,xL given that St=k: p(xt+1,…,xL| st=k) Slide22
Backward Algorithm - Example
b
x
1
= C
x
2
= P
x
3 = CA
j
p(st+1
=j|s
t
=k) p(xt+1
|s
t+1=j) b(j,t+1)
=
j
p(s
2=
j|A) p(P|s
2=j) b(j,2)
= p(A|A)p(P|A)b(A,2) + p(R|A)p(P|R)b(R,2) + p(B|A)p(P|B)b(B,2)
= 0.2*0.4*0.56 + 0.1*0.1*0.55 + 0.7*0.5*0.66 = 0.2813
0.561R
j
p(s
t+1=
j|s
t
=k) p(x
t+1
|st+1
=j) b(j,t+1)
=
j
p(s2
=j|R
) p(P|s2
=j) b(j,2)
= p(A|R)p(P|A)b(A,2) + p(R|R)p(P|R)b(R,2) + p(B|R)p(P|B)b(B,2)
= 0.1*0.4*0.56 + 0.1*0.1*0.55 + 0.8*0.5*0.66 = 0.2919
0.55
1
B
j
p(st+1
=
j|s
t
=k) p(x
t+1
|s
t+1
=j) b(j,t+1)
=
j
p(s
2=j|R) p(P|s2=j) b(j,2)= p(A|B)p(P|A)b(A,2) + p(R|B)p(P|R)b(R,2) + p(B|B)p(P|B)b(B,2)= 0.4*0.4*0.56 + 0.3*0.1*0.55 + 0.3*0.5*0.66 = 0.20510.661Given: Coke/Pepsi HMM, and sequence of observations: CPCFind the probability that the HMM emits xt+1,…,xL given that St=k: p(xt+1,…,xL| st=k) Slide23
Backward Algorithm - Example
b
x
1
= C
x
2
= P
x
3 = CA
0.2813
0.56
1R
0.2919
0.55
1
B
0.2051
0.66
1
Given: Coke/Pepsi HMM, and sequence of observations: CPC
Find the probability that the HMM emits
xt+1,…,xL given that St=k:
p(xt+1,…,xL
| st=k)
We can calculate the probability of CPC being generated by this HMM from the Backward table as follows: p(CPC) =
j b(j,1)p(j|start)p(C|j
) = (0.2813+0.6*0.6) + (0.2919*0.1*0.9) + (0.2051*0.3*0.5)= 0.15831
though we can obtain the same probability from the Forward table (as we did in a previous slide). Slide24
3. (cont.) Using the Forward and Backward tables to calculate the probability of St
= k given a sequence of observationsExample:
Given
: Coke/Pepsi HMM, and sequence of observations: CPC
Find the probability
that
the state visited at time
2
was B, that is p(s2=B| CPC)
In other words, given that the person drank CPC, what’s the probability that Pepsi was on sale during the 2nd
week?Based on the calculations we did on the previous slides:
p(s2
=B|CPC) = p(CPC;
s2=B)/p(CPC) = [ p( x
1=C, x2=P; s2
=B) p(x3=C|
x1=C, x2
=P
; s2=B) ] / p(x
1=C, x2=
P, x3=C)
= [ p(x1=C,
x2=P; s2=B
) p(x3=C|
s2=B) ] / p(CPC) = [ f(B,2) b(B,2) ] / p(CPC)
= [0.1845 * 0.66] / 0.15831 = 0.7691
here, p(CPC) was calculated by summing up the last column of the Forward table.so there is a high probability that Pepsi was on sale during week 2, given that the person drank Pepsi that week!