/
POS  tAGGING  and  HMM Tim POS  tAGGING  and  HMM Tim

POS tAGGING and HMM Tim - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
345 views
Uploaded On 2019-06-29

POS tAGGING and HMM Tim - PPT Presentation

Teks Mining Adapted from Heng Ji Outline POS Tagging and HMM 3 39 What is PartofSpeech POS Generally speaking Word Classes POS Verb Noun Adjective Adverb Article We can also include inflection ID: 760712

verb noun tagging pos noun verb pos tagging start tag tags words word sleep sequence corpus token based speech

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "POS tAGGING and HMM Tim" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

POS tAGGING and HMM

Tim

Teks

Mining

Adapted from

Heng

Ji

Slide2

Outline

POS Tagging and HMM

Slide3

3/39

What is Part-of-Speech (POS)

Generally speaking, Word Classes (=POS) :

Verb, Noun, Adjective, Adverb, Article,

We can also include inflection:

Verbs: Tense, number,

Nouns: Number, proper/common,

Adjectives: comparative, superlative,

Slide4

4/39

Parts of Speech

8 (ish) traditional parts of speech

Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc

Called: parts-of-speech, lexical categories, word classes, morphological classes, lexical tags...

Lots of debate within linguistics about the number, nature, and universality of these

We’ll completely ignore this debate.

Slide5

5/39

7 Traditional POS Categories

N noun chair, bandwidth, pacing

V verb study, debate, munch

ADJ adj purple, tall, ridiculous

ADV adverb unfortunately, slowly,

P preposition of, by, to

PRO pronoun I, me, mine

DET determiner the, a, that, those

Slide6

6/39

POS Tagging

The process of assigning a part-of-speech or lexical class marker to each word in a collection.

WORD tag

the DET

koala N

put V

the DET

keys N

on P

the DET

table N

Slide7

7/39

Penn TreeBank POS Tag Set

Penn Treebank: hand-annotated corpus of

Wall Street Journal

, 1M words

46 tags

Some particularities:

to

/TO not disambiguated

Auxiliaries and verbs not distinguished

Slide8

8/39

Penn Treebank Tagset

Slide9

9/39

Why POS tagging is useful?

Speech synthesis:

How to pronounce

lead

?

INsult inSULT

OBject obJECT

OVERflow overFLOW

DIScount disCOUNT

CONtent conTENT

Stemming for information retrieval

Can search for “aardvarks” get “aardvark”

Parsing and speech recognition and etc

Possessive pronouns (my, your, her) followed by nouns

Personal pronouns (I, you, he) likely to be followed by verbs

Need to know if a word is an N or V before you can parse

Information extraction

Finding names, relations, etc.

Machine Translation

Slide10

10/39

Open and Closed Classes

Closed class: a small fixed membership

Prepositions: of, in, by, …

Auxiliaries: may, can, will had, been, …

Pronouns: I, you, she, mine, his, them, …

Usually

function words

(short common words which play a role in grammar)

Open class: new ones can be created all the time

English has 4: Nouns, Verbs, Adjectives, Adverbs

Many languages have these 4, but not all!

Slide11

11/39

Open Class Words

Nouns

Proper nouns (Boulder, Granby, Eli Manning)

English capitalizes these.

Common nouns (the rest).

Count nouns and mass nouns

Count: have plurals, get counted: goat/goats, one goat, two goats

Mass: don’t get counted (snow, salt, communism) (*two snows)

Adverbs: tend to modify things

Unfortunately,

John

walked home

extremely slowly yesterday

Directional/locative adverbs (here,home, downhill)

Degree adverbs (extremely, very, somewhat)

Manner adverbs (slowly, slinkily, delicately)

Verbs

In English, have morphological affixes (eat/eats/eaten)

Slide12

12/39

Closed Class Words

Examples

:

prepositions:

on, under, over,

particles:

up, down, on, off, …

determiners:

a, an, the, …

pronouns:

she, who, I, ..

conjunctions:

and, but, or, …

auxiliary verbs:

can, may should, …

numerals:

one, two, three, third, …

Slide13

13/39

Prepositions from CELEX

Slide14

14/39

English Particles

Slide15

15/39

Conjunctions

Slide16

16/39

POS TaggingChoosing a Tagset

There are so many parts of speech, potential distinctions we can draw

To do POS tagging, we need to choose a standard set of tags to work with

Could pick very coarse tagsets

N, V, Adj, Adv.

More commonly used set is finer grained, the “Penn TreeBank tagset”, 45 tags

PRP$, WRB, WP$, VBG

Even more fine-grained tagsets exist

Slide17

17/39

Using the Penn Tagset

The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”)

Except the preposition/complementizer “to” is just marked “TO”.

Slide18

18/39

POS Tagging

Words often have more than one POS: backThe back door = JJOn my back = NNWin the voters back = RBPromised to back the bill = VBThe POS tagging problem is to determine the POS tag for a particular instance of a word.

These examples from Dekang Lin

Slide19

19/39

How Hard is POS Tagging? Measuring Ambiguity

Slide20

20/39

Current Performance

How many tags are correct?

About 97% currently

But baseline is already 90%

Baseline algorithm:

Tag every word with its most frequent tag

Tag unknown words as nouns

How well do people do?

Slide21

21/39

Quick Test: Agreement?

the students went to classplays well with othersfruit flies like a banana

DT: the, this, that

NN: noun

VB: verb

P: prepostion

ADV: adverb

Slide22

22/39

Quick Test

the students went to class

DT NN VB P NN

plays well with others

VB ADV P NN

NN NN P DT

fruit flies like a banana

NN NN VB DT NN

NN VB P DT NN

NN NN P DT NN

NN VB VB DT NN

Slide23

23/39

How to do it? History

1960

1970

1980

1990

2000

Brown Corpus Created (EN-US)

1 Million Words

Brown Corpus Tagged

HMM Tagging (CLAWS)

93%-95%

Greene and Rubin

Rule Based - 70%

LOB Corpus Created (EN-UK)

1 Million Words

DeRose/Church

Efficient HMM

Sparse Data

95%+

British National Corpus

(tagged by CLAWS)

POS Tagging separated from other NLP

Transformation Based Tagging

(Eric Brill)

Rule Based – 95%+

Tree-Based Statistics (Helmut Shmid)

Rule Based – 96%+

Neural Network 96%+

Trigram Tagger(Kempe)96%+

Combined Methods98%+

Penn Treebank Corpus(WSJ, 4.5M)

LOB Corpus Tagged

Slide24

24/39

Two Methods for POS Tagging

Rule-based tagging

(ENGTWOL)

Stochastic

Probabilistic sequence models

HMM (Hidden Markov Model) tagging

MEMMs (Maximum Entropy Markov Models)

Slide25

25/39

Rule-Based Tagging

Start with a dictionary

Assign all possible tags to words from the dictionary

Write rules by hand to selectively remove tags

Leaving the correct tag for each word.

Slide26

26/39

Rule-based taggers

Early POS taggers all hand-coded

Most of these (Harris, 1962; Greene and Rubin, 1971) and the best of the recent ones, ENGTWOL (Voutilainen, 1995) based on a two-stage architecture

Stage 1: look up word in lexicon to give list of potential POSs

Stage 2: Apply rules which certify or disallow tag sequences

Rules originally handwritten; more recently Machine Learning methods can be used

Slide27

27/39

Start With a Dictionary

she: PRP

promised: VBN,VBD

to TO

back: VB, JJ, RB, NN

the: DT

bill: NN, VB

Etc… for the ~100,000 words of English with more than 1 tag

Slide28

28/39

Assign Every Possible Tag

NN

RB

VBN

JJ VB

PRP VBD TO VB DT NN

She promised to back the bill

Slide29

29/39

Write Rules to Eliminate Tags

Eliminate VBN if VBD is an option when VBN|VBD follows “<start> PRP” NN RB JJ VBPRP VBD TO VB DT NNShe promised to back the bill

VBN

Slide30

30/39

POS tagging

The involvement of ion channels in B and T lymphocyte activation is

DT NN IN NN NNS IN NN CC NN NN NN VBZ supported by many reports of changes in ion fluxes and membrane VBN IN JJ NNS IN NNS IN NN NNS CC NN…………………………………………………………………………………….…………………………………………………………………………………….

Machine Learning Algorithm

training

We demonstrate

that …

Unseen text

We demonstrate

PRP VBP

that …

IN

Slide31

31/39

Goal of POS Tagging

We want the best set of tags for a sequence of words (a sentence)W — a sequence of wordsT — a sequence of tags

Example:

P(

(NN NN P DET ADJ NN) | (heat oil in a large pot))

Our

Goal

Slide32

32/39

But, the Sparse Data Problem …

Rich Models often require vast amounts of data

Count up instances of the string "heat oil in a large pot" in the training corpus, and pick the most common tag assignment to the string..

Too many possible combinations

Slide33

33/39

POS Tagging as Sequence Classification

We are given a sentence (an “observation” or “sequence of observations”)

Secretariat is expected to race tomorrow

What is the best sequence of tags that corresponds to this sequence of observations?

Probabilistic view:

Consider all possible sequences of tags

Out of this universe of sequences, choose the tag sequence which is most probable given the observation sequence of n words w

1

…w

n

.

Slide34

34/39

Getting to HMMs

We want, out of all sequences of n tags t1…tn the single tag sequence such that P(t1…tn|w1…wn) is highest.Hat ^ means “our estimate of the best one”Argmaxx f(x) means “the x such that f(x) is maximized”

Slide35

35/39

Getting to HMMs

This equation is guaranteed to give us the best tag sequenceBut how to make it operational? How to compute this value?Intuition of Bayesian classification:Use Bayes rule to transform this equation into a set of other probabilities that are easier to compute

Slide36

36/39

Reminder: ApplyBayes’ Theorem (1763)

posterior

prior

likelihood

marginal likelihood

Reverend Thomas Bayes

Presbyterian minister (1702-1761)

Our Goal: To maximize it!

Slide37

37/39

How to Count

P(W|T) and P(T) can be counted from a large hand-tagged corpus; and smooth them to get rid of the zeroes

Slide38

38/39

Count P(W|T) and P(T)

Assume each word in the sequence depends only on its corresponding tag:

Slide39

39/39

Make a Markov assumption and use N-grams over tags ...P(T) is a product of the probability of N-grams that make it up

Count P(T)

history

Slide40

40/39

Part-of-speech tagging with Hidden Markov Models

words

tags

output probability

transition probability

Slide41

41/39

Analyzing

Fish sleep.

Slide42

42/39

A Simple POS HMM

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Slide43

43/39

Word Emission ProbabilitiesP ( word | state )

A two-word language:

fish

and

sleep

Suppose in our training corpus,

fish

appears 8 times as a noun and

5

times as a verb

sleep

appears twice as a noun and

5

times as a verb

Emission probabilities:

Noun

P(fish | noun) : 0.8

P(sleep | noun) : 0.2

Verb

P(fish | verb) : 0.5

P(sleep | verb) : 0.5

Slide44

44/39

Viterbi Probabilities

Slide45

45/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Slide46

46/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 1: fish

Slide47

47/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 1: fish

Slide48

48/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 2: sleep

(if ‘fish’ is verb)

Slide49

49/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 2: sleep

(if ‘fish’ is verb)

Slide50

50/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 2: sleep

(if ‘fish’ is a noun)

Slide51

51/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 2: sleep

(if ‘fish’ is a noun)

Slide52

52/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 2: sleep

take maximum,

set back pointers

Slide53

53/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 2: sleep

take maximum,

set back pointers

Slide54

54/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 3: end

Slide55

55/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Token 3: end

take maximum,

set back pointers

Slide56

56/39

start

noun

verb

end

0.8

0.2

0.8

0.7

0.1

0.2

0.1

0.1

Decode:

fish = noun

sleep = verb

Slide57

Markov Chain for a Simple Name Tagger

START

END

PER

X

0.3

0.2

0.2

0.2

0.3

0.6

0.5

George:0.3

W.:0.3

W.:0.3

discussed:0.7

$:1.0

LOC

0.5

0.2

0.1

0.3

0.3

0.1

0.2

Bush:0.3

Iraq:0.1

George:0.2

Iraq:0.8

Transition

Probability

Emission

Probability

Slide58

58/39

Exercise

Tag names in the followin

g sentence:

George. W. Bush discussed Iraq.

Slide59

59/39

POS taggers

Brill

s tagger

http://www.cs.jhu.edu/~brill/

TnT

tagger

http://www.coli.uni-saarland.de/~thorsten/tnt/

Stanford tagger

http://nlp.stanford.edu/software/tagger.shtml

SVMTool

http://www.lsi.upc.es/~nlp/SVMTool/

GENIA tagger

http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger

/

More complete list at:

http://www-nlp.stanford.edu/links/statnlp.html#Taggers