/
Hidden Markov Models Teaching Demo The University of Arizona Hidden Markov Models Teaching Demo The University of Arizona

Hidden Markov Models Teaching Demo The University of Arizona - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
342 views
Uploaded On 2019-10-30

Hidden Markov Models Teaching Demo The University of Arizona - PPT Presentation

Hidden Markov Models Teaching Demo The University of Arizona Tatjana Scheffler tatjanaschefflerunipotsdamde WarmUp Parts of Speech Part of Speech Tagging Grouping words into morphosyntactic types like noun verb etc ID: 761255

algorithm hmm hidden likelihood hmm algorithm likelihood hidden ice states question tagging pos sequence sentence state viterbi compute tags

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hidden Markov Models Teaching Demo The U..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Hidden Markov Models Teaching DemoThe University of ArizonaTatjana Schefflertatjana.scheffler@uni-potsdam.de

Warm-Up: Parts of Speech Part of Speech Tagging = Grouping words into morphosyntactic types like noun, verb, etc.: 2 She went for a walk

Warm-Up: Parts of Speech Grouping words into morphosyntactic types like noun, verb, etc.: 3 PRON VERB ADP DET NOUN She went for a walk

POS tags (Universal Dependencies) 4 Open class words Closed class words Other ADJ ADPPUNCTADVAUXSYMINTJCCONJXNOUNDET PROPNNUM VERBPART  PRON  SCONJ https:// universaldependencies.org /u/ pos /

Let’s try! Look at your your word(s) and find an appropriate (or likely) part of speech tag. Write it on the paper.Now find the other words from your sentence (sentences are numbered an color-coded in the upper left corner). Re-tag the sentence. Did any of your group have to change POS tags? How did you know? 5

Hidden Markov Models Generative language modelCompare, e.g. n-gram model: P(wn|w1,…,wn-1 ) Now, two step process: Generate sequence of hidden states (POS tags) t 1, …, t T from a bigram model P(t i |t i-1 ) Independently, generate an observable word wi from each state ti, from a bigram model P(wi|ti)6

Question 1: Language Modelling Given an HMM and a string w1, …, wT, what is its likelihood P(w1, …, w T )? Compute it efficiently with the Forward algorithm 7

Question 2: POS Tagging Given an HMM and a string w1, …, wT, what is the most likely sequence of hidden tags t 1 , …, t T that generated it? Compute it efficiently with the Viterbi algorithm   8

Question 3: Training (not today) Train HMM parameters from a set of POS tags andAnnotated training data: Maximum likelihood training with smoothing Unannotated training data: Forward-backward algorithm (an instance of EM) 9

Hidden Markov Model (formally) A Hidden Markov Model is a 5-tuple consisting of:a finite set of states Q={q1,…,qN} (= POS tags) a finite set of possible observations O (= words) initial probabilities a 0i = P(X1 = qi) transition probabilities a ij = P(X t+1 = q j | Xt = qi)emission probabilities bi(o) = P(Yt = o | Xt = qi)The HMM describes two coupled random processes:Xt = qi : At time t, HMM is in state qiYt = o : At time t, HMM emits observation o.10

Example: Eisner’s ice cream diary States: weather in Baltimore on a given dayObservations: How many ice creams Eisner ate that day 11 initial p. a 0H transition p. a CH emission p. b C (3)

HMM models x,y jointlyThe coupled random processes of HMM give us the joint probability P( x,y ) where x = x1, …, xT is the sequence of hidden states and y = y 1 , …, y T is the sequence of observations   12

Question 1: Likelihood P(y) How likely is it that Eisner ate 3 ice creams on day 1, 1 ice cream on day 2, and 3 ice creams on day 3?Want to compute P(3,1,3)Definitions let us compute joint probabilities like P(H,3,C,1,H,3) easily.But there could be different state sequences that lead to (3,1,3)We must sum up over all of them 13

Naïve approach Sum over all possible state sequences to compute P(3,1,3): Technical term: Marginalization   14

Naïve approach is too expensive Naïve approach sums up over exponential number of terms. This is too slow for practical use.Visualize the path through the hidden states in a trellis (unfolding of the HMM)one column for each time point, represents X t each column contains a copy of all the states in the HMM edges from states in t to t+1 (transitions in the HMM) Each path in the trellis is one state sequenceSo P(y) is the sum of all paths that emit y 15

Ice cream trellis 16

Ice cream trellis 17

Sentence likelihood 18

Sentence likelihood 19

Sentence likelihood 20

Sentence likelihood 21

Sentence likelihood 22

The Forward algorithm In the naïve solution, we compute many intermediate results several timesCentral idea: Define the forward probability that the HMM outputs and ends in So:   23  

The Forward algorithm Base case, t = 1: Inductive step for t = 2 … T:   24

P(3,1,3) with Forward 25  

P(3,1,3) with Forward 26    

P(3,1,3) with Forward 27      

P(3,1,3) with Forward 28        

P(3,1,3) with Forward 29          

P(3,1,3) with Forward 30            

P(3,1,3) with Forward 31              

P(3,1,3) with Forward 32                

Question 1: Likelihood P(y) How likely is it that Eisner ate 3 ice creams on day 1, 1 ice cream on day 2, and 3 ice creams on day 3? Use Forward algorithm to sum up over different paths (= weather patterns) efficiently.   33

Question 2: Tagging Given observations y1, …, yT, what is the most likely sequence of hidden states x 1 , …, x T? We are only interested in the most likely sequence of states (not really their probability).   34

Naïve solution 35

Parallelism Likelihood (Question 1) Tagging (Question 2) P(y) argmax P( x,y ) Forward algorithm Viterbi algorithm Likelihood (Question 1) Tagging (Question 2) P(y) argmax P( x,y ) Forward algorithm Viterbi algorithm 36

The Viterbi algorithm Base case, t = 1: Inductive step, t = 2 … T: For each state and time step (j, t), remember the i for which maximum was achieved as a backpointer bp t (j) Retrieve optimal tag sequence by following bps T → 1   37

P(x,3,1,3) with Viterbi 38  

P(x,3,1,3) with Viterbi 39              

Runtime Forward and Viterbi have the same runtime, dominated by the inductive step: Compute N T times. Each computation iterates over N predecessor states i Total runtime is O(N 2 T) Linear in sentence length Quadratic in number of states (tags)   40

Summary Hidden Markov Models are a popular model for POS tagging and other tasks (e.g., dialog act tagging, see research talk tomorrow!)HMM = two coupled random processes:Bigram hidden state modelModel generating observable outputs from statesEfficient algorithms for two common problems: Forward algorithm for likelihood computation Viterbi algorithm for tagging (= best state sequence) 41

Eisner’s ice cream HMM 42