15 Hidden Markov Models Apaydin slides with a several modifications and additions by Christoph Eick Introduction Modeling dependencies in input no longer iid eg the order of observations in a dataset matters ID: 436422
Download Presentation The PPT/PDF document "CHAPTER" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CHAPTER 15: Hidden Markov Models
Apaydin
slides with a several modifications and additions by
Christoph
Eick
. Slide2
IntroductionModeling dependencies in input; no longer iid
;
e.g
the order of observations in a dataset matters:Temporal Sequences: In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). Stock market (stock values over time)Spatial SequencesBase pairs in DNA Sequences
2
Lecture Notes for E
Alpaydın
2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Slide3
Discrete Markov ProcessN states:
S
1
, S2, ..., SN State at “time” t, qt = SiFirst-order Markov
P(q
t
+1
=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si) Transition probabilities aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1Initial probabilities πi ≡ P(q1=Si) Σj=1N πi=1
3Slide4
Stochastic Automaton/Markov Chain
4
Lecture Notes for E
Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Slide5
Three urns each full of balls of one color
S
1: blue, S2: red, S3: greenExample: Balls and Urns
5
1
2
3Slide6
Given K example sequences of length
T
Balls and Urns: Learning
6Remark: Extract the probabilities from the observed sequences:s1-s2-s1-s3s2-s1-s1-s2
1
=1/3,
2=2/3, a11=1/3, a12=1/3, a13=1/3, a21=3/4,…s2-s3-s2-s1Slide7
Hidden Markov ModelsStates are not observableDiscrete observations {
v
1
,v2,...,vM} are recorded; a probabilistic function of the stateEmission probabilities bj(m) ≡ P(Ot=
vm |
q
t
=Sj)Example: In each urn, there are balls of different colors, but with different probabilities.For each observation sequence, there are multiple state sequences7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)http://en.wikipedia.org/wiki/Hidden_Markov_model Slide8
HMM Unfolded in Time8
Lecture Notes for E
Alpaydın
2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter10.htmlSlide9
Now a more complicated problem9
We observe:
1
2
3
What urn sequence create it?
1-1-2-2 (somewhat trivial, as states are observable!)
(1 or 2)-(1 or 2)-(2 or 3)-(2 or 3) and the potential sequences have different
probabilities—
e.g
drawing a blue ball from urn1 is more likely than from urn2!
Markov
Chains
Hidden
Markov
ModelsSlide10
Another Motivating Example10Slide11
Elements of an HMMN: Number of states
M
: Number of observation symbols
A = [aij]: N by N state transition probability matrixB = bj(m): N by M observation probability matrix
Π = [π
i
]:
N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Slide12
Three Basic Problems of HMMs
Evaluation
:
Given λ, and sequence O, calculate P (O | λ)Most Likely State Sequence: Given λ
and sequence
O
, find
state sequence Q* such that P (Q* | O, λ ) = maxQ P (Q | O , λ ) Learning: Given a set of sequence O={O1,…Ok}, find λ* such that λ* is the most like explanation for the sequences in O. P ( O | λ* )=maxλ k P ( Ok | λ ) 12(Rabiner, 1989)Slide13
Forward variable:Evaluation
13
Probability of observing O
1-…-Ot
and additionally being in state i
Complexity: O(N
2
*T)Using i the probability of the observed sequence can be computed as follows:Slide14
Backward variable:
14
Lecture Notes for E
Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Probability of observing O
t+1
-…-O
T and additionally being in state iSlide15
Finding the Most Likely State Sequence
15
Choose the state that has the highest probability,
for each time step: qt
*= arg max
i
γ
t(i)Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Observe: O1…OtOt+1…OTt(i):=Probabilityof being instate i at step t.Slide16
Viterbi’s Algorithm
δ
t(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt
-1,qt
=
Si,O1∙∙∙Ot | λ)Initialization: δ1(i) = πibi(O1), ψ1(i) = 0Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aijTermination: p* = maxi δT(i), qT*= argmaxi δT
(
i
)
Path backtracking:
q
t
*
= ψ
t
+1
(
q
t
+1
*
),
t
=
T
-1,
T
-2, ..., 1
16
Idea: Combines path probability computations
with backtracking over competing paths.
Only briefly discussed in 2014!Slide17
Baum-Welch Algorithm
Baum-
Welch
AlgorithmO={O1,…,OK}Model =(A,B,)
Hidden State Sequence
Observed Symbol Sequence
OSlide18
Learning a Model
from Sequences O
18This is a hidden(latent) variable, measuring the probability of going from state i to state j at step t+1 observing
Ot+1, given a model
and an
observed sequence O O
k.An EM-style algorithm is used!This is a hidden(latent) variable, measuring the probability of being in state i step t observing given a model and anobserved sequence O Ok.Slide19
Baum-Welch Algorithm: M-Step
19
Remark: k iterates over the observed sequences O
1,…,OK;for each individual sequence OrO r
and r are computed in the E-step
; then,
the actual model is computed in the M-step by averaging over the estimates
of i,aij,bj (based on k and k) for each of the K observed sequences. Probability going from i to jProbability being in iSlide20
Baum-Welch Algorithm: Summary
Lecture Notes for E
Alpaydın
2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Baum-WelchAlgorithm
O={O1,…,O
K
}
Model =(A,B,)For more discussion see: http://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf See also: http://www.digplanet.com/wiki/Baum%E2%80%93Welch_algorithmSlide21
Generalization of HMM: Continuous Observations
21
The observations generated at each time step are vectors consisting of k numbers; a multivariate Gaussian
with k dimensions is associated with each state j, defining the probabilities of k-dimensional vector v generated when being in state j:
Hidden State Sequence
Observed Vector Sequence
O
=(A, (j,j) j=1,…n,B) Slide22
Input-dependent observations:
Input-dependent transitions (Meila and Jordan, 1996; Bengio and Frasconi, 1996):
Time-delay input:
Generalization: HMM with Inputs22
Lecture Notes for E
Alpaydın
2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)