/
CHAPTER CHAPTER

CHAPTER - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
368 views
Uploaded On 2016-08-07

CHAPTER - PPT Presentation

15 Hidden Markov Models Apaydin slides with a several modifications and additions by Christoph Eick Introduction Modeling dependencies in input no longer iid eg the order of observations in a dataset matters ID: 436422

sequence state learning probability state sequence probability learning sequences introduction notes mit press machine 2010 alpayd

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CHAPTER" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CHAPTER 15: Hidden Markov Models

Apaydin

slides with a several modifications and additions by

Christoph

Eick

. Slide2

IntroductionModeling dependencies in input; no longer iid

;

e.g

the order of observations in a dataset matters:Temporal Sequences: In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). Stock market (stock values over time)Spatial SequencesBase pairs in DNA Sequences

2

Lecture Notes for E

Alpaydın

2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Slide3

Discrete Markov ProcessN states:

S

1

, S2, ..., SN State at “time” t, qt = SiFirst-order Markov

P(q

t

+1

=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si) Transition probabilities aij ≡ P(qt+1=Sj | qt=Si) aij ≥ 0 and Σj=1N aij=1Initial probabilities πi ≡ P(q1=Si) Σj=1N πi=1

3Slide4

Stochastic Automaton/Markov Chain

4

Lecture Notes for E

Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Slide5

Three urns each full of balls of one color

S

1: blue, S2: red, S3: greenExample: Balls and Urns

5

1

2

3Slide6

Given K example sequences of length

T

Balls and Urns: Learning

6Remark: Extract the probabilities from the observed sequences:s1-s2-s1-s3s2-s1-s1-s2

 

1

=1/3, 

2=2/3, a11=1/3, a12=1/3, a13=1/3, a21=3/4,…s2-s3-s2-s1Slide7

Hidden Markov ModelsStates are not observableDiscrete observations {

v

1

,v2,...,vM} are recorded; a probabilistic function of the stateEmission probabilities bj(m) ≡ P(Ot=

vm |

q

t

=Sj)Example: In each urn, there are balls of different colors, but with different probabilities.For each observation sequence, there are multiple state sequences7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)http://en.wikipedia.org/wiki/Hidden_Markov_model Slide8

HMM Unfolded in Time8

Lecture Notes for E

Alpaydın

2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter10.htmlSlide9

Now a more complicated problem9

We observe:

1

2

3

What urn sequence create it?

1-1-2-2 (somewhat trivial, as states are observable!)

(1 or 2)-(1 or 2)-(2 or 3)-(2 or 3) and the potential sequences have different

probabilities—

e.g

drawing a blue ball from urn1 is more likely than from urn2!

Markov

Chains

Hidden

Markov

ModelsSlide10

Another Motivating Example10Slide11

Elements of an HMMN: Number of states

M

: Number of observation symbols

A = [aij]: N by N state transition probability matrixB = bj(m): N by M observation probability matrix

Π = [π

i

]:

N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Slide12

Three Basic Problems of HMMs

Evaluation

:

Given λ, and sequence O, calculate P (O | λ)Most Likely State Sequence: Given λ

and sequence

O

, find

state sequence Q* such that P (Q* | O, λ ) = maxQ P (Q | O , λ ) Learning: Given a set of sequence O={O1,…Ok}, find λ* such that λ* is the most like explanation for the sequences in O. P ( O | λ* )=maxλ k P ( Ok | λ ) 12(Rabiner, 1989)Slide13

Forward variable:Evaluation

13

Probability of observing O

1-…-Ot

and additionally being in state i

Complexity: O(N

2

*T)Using i the probability of the observed sequence can be computed as follows:Slide14

Backward variable:

14

Lecture Notes for E

Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Probability of observing O

t+1

-…-O

T and additionally being in state iSlide15

Finding the Most Likely State Sequence

15

Choose the state that has the highest probability,

for each time step: qt

*= arg max

i

γ

t(i)Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Observe: O1…OtOt+1…OTt(i):=Probabilityof being instate i at step t.Slide16

Viterbi’s Algorithm

δ

t(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt

-1,qt

=

Si,O1∙∙∙Ot | λ)Initialization: δ1(i) = πibi(O1), ψ1(i) = 0Recursion: δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aijTermination: p* = maxi δT(i), qT*= argmaxi δT

(

i

)

Path backtracking:

q

t

*

= ψ

t

+1

(

q

t

+1

*

),

t

=

T

-1,

T

-2, ..., 1

16

Idea: Combines path probability computations

with backtracking over competing paths.

Only briefly discussed in 2014!Slide17

Baum-Welch Algorithm

Baum-

Welch

AlgorithmO={O1,…,OK}Model =(A,B,)

Hidden State Sequence

Observed Symbol Sequence

OSlide18

Learning a Model

from Sequences O

18This is a hidden(latent) variable, measuring the probability of going from state i to state j at step t+1 observing

Ot+1, given a model

 and an

observed sequence O O

k.An EM-style algorithm is used!This is a hidden(latent) variable, measuring the probability of being in state i step t observing given a model  and anobserved sequence O Ok.Slide19

Baum-Welch Algorithm: M-Step

19

Remark: k iterates over the observed sequences O

1,…,OK;for each individual sequence OrO r

and r are computed in the E-step

; then,

the actual model  is computed in the M-step by averaging over the estimates

of i,aij,bj (based on k and k) for each of the K observed sequences. Probability going from i to jProbability being in iSlide20

Baum-Welch Algorithm: Summary

Lecture Notes for E

Alpaydın

2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)Baum-WelchAlgorithm

O={O1,…,O

K

}

Model =(A,B,)For more discussion see: http://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf See also: http://www.digplanet.com/wiki/Baum%E2%80%93Welch_algorithmSlide21

Generalization of HMM: Continuous Observations

21

The observations generated at each time step are vectors consisting of k numbers; a multivariate Gaussian

with k dimensions is associated with each state j, defining the probabilities of k-dimensional vector v generated when being in state j:

Hidden State Sequence

Observed Vector Sequence

O

=(A, (j,j) j=1,…n,B) Slide22

Input-dependent observations:

Input-dependent transitions (Meila and Jordan, 1996; Bengio and Frasconi, 1996):

Time-delay input:

Generalization: HMM with Inputs22

Lecture Notes for E

Alpaydın

2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)