/
Unsupervised Part-of-Speech Tagging Unsupervised Part-of-Speech Tagging

Unsupervised Part-of-Speech Tagging - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
414 views
Uploaded On 2017-09-11

Unsupervised Part-of-Speech Tagging - PPT Presentation

with Bilingual GraphBased Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University PartofSpeech Tagging Portland has a thriving music scene ID: 587082

essen noun portland verb noun essen verb portland ist bei projection hmm hat graph label musikszene model gedeihende det eine propagation bilingual

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Unsupervised Part-of-Speech Tagging" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Unsupervised Part-of-Speech Taggingwith Bilingual Graph-Based Projections

June 21

ACL 2011

Slav PetrovGoogle Research

Dipanjan Das

Carnegie Mellon UniversitySlide2

Part-of-Speech Tagging

Portland has a thriving music scene .NOUNVERB

DETADJNOUNNOUN

.2Slide3

Supervised

POS Tagging

Supervised setting: average accuracy is 96.2% with TnT3 (Brants, 2000)Slide4

Resource-Poor Languages

Several major languages with no or little annotated data

Oriya

Indonesian-Malay

Azerbaijani

e.g.

See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size

Haitian

However,

lots

of parallel and unannotated data!

Basic NLP tools like POS tagging essential for development of

language technologies

4

Punjabi

Vietnamese

Polish

32 million

37 million

20 million

Native speakers

7.7 million

109

million

69

million

40 millionSlide5

(Nearly) Universal Part-of-Speech Tags

VERBDETNOUN

CONJPRONNUMADJ

PRTADV.

ADP

X

5Slide6

(Nearly) Universal Part-of-Speech Tags

Example Penn Treebank tag maps:

NN NOUNNNP NOUNNNPS NOUNNNS NOUN PRP

PRONPRP$ PRONWP PRONWP$ PRON 

n

p

NOUN

nc

NOUN

 

Example Spanish Treebank tag maps:

p

0

PRON

pd

PRON

pe

PRON

pi

PRON

pn

PRON pp PRONpr

PRON

pt PRON

px PRON See Petrov, Das and McDonald (2011)Slide7

(Nearly) Universal Part-of-Speech Tags

Portland has a thriving music scene .

NOUNVERBDET

ADJNOUNNOUN

.

Portland hat eine prächtig gedeihende Musikszene .

NOUN

VERB

DET

ADJ

ADJ

NOUN

.

পোর্টল্যান্ড

শহর

এর

সঙ্গীত

পরিবেশ

বেশ

উন্নত | 

NOUN

NOUN

ADP

NOUN

NOUN

ADJ

ADJ

.

7Slide8

State of the Art in

Unsupervised POS Tagging8Slide9

Unsupervised Part-of-Speech Tagging

Portlandhateineprächtig

gedeihendeMusikszene.?

????

?

?

Hidden Markov Model (HMM)

estimated with the Expectation-Maximization algorithm

: observation sequence

: state sequence

9

Merialdo

(1994)Slide10

Unsupervised Part-of-Speech Tagging

Portlandhateineprächtig

gedeihendeMusikszene.?

????

?

?

Hidden Markov Model (HMM)

estimated with the Expectation-Maximization algorithm

one of the 12 coarse tags

: observation sequence

: state sequence

10

Merialdo

(1994)Slide11

Unsupervised Part-of-Speech Tagging

Portlandhat

??

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithmtransition multinomials

: observation sequence

: state sequence

11

Merialdo

(1994)Slide12

Unsupervised Part-of-Speech Tagging

Portlandhat?

?

Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithmemission multinomials

: observation sequence

: state sequence

12

Merialdo

(1994)Slide13

Unsupervised Part-of-Speech Tagging

Portlandhateineprächtig

gedeihendeMusikszene.?

????

?

?

Hidden Markov Model (HMM)

estimated with the Expectation-Maximization algorithm

Danish

Dutch

German

Greek

Italian

Portuguese

Spanish

Swedish

Average

68.7

57.0

75.9

65.8

63.7

62.9

71.5

68.4

66.7

EM-HMM

Poor average result

13

Johnson (2007)Slide14

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear models

: observation sequence

Portlandhat??

emission multinomials

: state sequence

14

Berg-Kirkpatrick et al. (2010)Slide15

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear models

: observation sequence

Portlandhat

?

?

emission multinomials

suffix

hyphen

capital letters

numbers

...

: state sequence

15

Berg-Kirkpatrick et al. (2010)Slide16

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear models

: observation sequence

Portlandhat

?

?

emission multinomials

suffix

hyphen

capital letters

numbers

...

Estimated using gradient-based methods

: state sequence

16

Berg-Kirkpatrick et al. (2010)Slide17

Unsupervised Part-of-Speech Tagging

Hidden Markov Model (HMM) with locally normalized log-linear modelsPortlandhat

??

emission multinomials

Danish

Dutch

German

Greek

Italian

Portuguese

Spanish

Swedish

Average

68.7

57.0

75.9

65.8

63.7

62.9

71.5

68.4

66.7

69.1

65.1

81.3

71.8

68.1

78.4

80.270.1

73.0

EM-HMM

Feature-HMM

Estimated using gradient-based methods

Improvements across all languages

17Slide18

Portland

hateineprächtig

gedeihendeMusikszene.NOUN

VERBPRONDETADJNUMADJADVADJ

NOUN

.

Unsupervised POS Tagging with

Dictionaries

Hidden Markov Model (HMM)

with locally normalized log-linear models

State space constrained by possible gold tags

18Slide19

Portland

hateineprächtig

gedeihendeMusikszene.NOUN

VERBPRONDETADJNUMADJADVADJ

NOUN

.

Unsupervised POS Tagging with

Dictionaries

Hidden Markov Model (HMM)

with locally normalized log-linear models

State space constrained by possible gold tags

Danish

Dutch

German

Greek

Italian

Portuguese

Spanish

Swedish

Average

68.7

57.0

75.9

65.8

63.7

62.9

71.5

68.4

66.7

69.1

65.1

81.3

71.8

68.1

78.4

80.2

70.1

73.0

93.1

94.7

93.596.696.4

94.095.885.593.7

EM-HMM

Feature-HMM

w/ gold dictionary19Slide20

For

most languages, access to high-qualitytag dictionaries is not realistic.Ideas: Use supervision in resource-rich languages

Use parallel dataConstruct projected tag lexicons20Morphologically rich languages only have base forms in dictionariesSlide21

Bilingual Projection

Portland has a thriving music scene .NOUNVERB

DETADJNOUNNOUN

.automatic labels from supervised tagger, 97% accuracy21Slide22

Bilingual Projection

Portland has a thriving music scene .NOUNVERB

DETADJNOUNNOUN

.Portland hat eine prächtig gedeihende Musikszene .

Automatic unsupervised alignments from translation data

(available for more than 50 languages)

22Slide23

Bilingual Projection

Portland has a thriving music scene .NOUNVERB

DETADJNOUNNOUN

.Portland hat eine prächtig gedeihende Musikszene .

Baseline1

:

direct projection

unaligned word

NOUN

(most frequent tag)

23

Yarowsky and Ngai (2001)Slide24

Bilingual Projection

Portland hat eine prächtig gedeihende Musikszene .

NOUNVERBDETNOUN

ADJNOUN.

+

more projected tagged sentences

supervised training

tagger

24

(Brants, 2000)

Yarowsky and Ngai (2001)

Baseline1

:

direct projectionSlide25

Bilingual Projection

Baseline 1:

direct projection25DanishDutchGermanGreekItalianPortugueseSpanishSwedish

Average68.757.075.965.863.762.971.5

68.4

66.7

69.1

65.1

81.3

71.8

68.1

78.4

80.2

70.1

73.0

73.6

77.0

83.2

79.3

79.7

82.6

80.1

74.7

78.8

EM-HMM

Direct

projection

Feature-HMM

Yarowsky and Ngai (2001)Slide26

Bilingual Projection

Baseline 1:

direct projection26Yarowsky and Ngai (2001)

consistent improvements over unsupervised modelsDanishDutchGermanGreek

Italian

Portuguese

Spanish

Swedish

Average

68.7

57.0

75.9

65.8

63.7

62.9

71.5

68.4

66.7

69.1

65.1

81.3

71.8

68.1

78.4

80.2

70.1

73.0

73.6

77.0

83.2

79.3

79.782.680.174.7

78.8

EM-HMM

Direct

projection

Feature-HMMSlide27

Bilingual Projection

Baseline 2

: lexicon projection27Slide28

Bilingual Projection

Baseline 2

: lexicon projection

NOUNPortlandVERBhas

DET

a

ADJ

thriving

NOUN

music

NOUN

scene

.

.

Portland

hat

eine

prächtig

gedeihende

Musikszene

.

28Slide29

Bilingual Projection

Baseline 2

: lexicon projection

NOUNPortlandPortland

ADJ

thriving

gedeihende

prächtig

VERB

has

hat

DET

a

eine

NOUN

scene

Musikszene

NOUN

music

.

.

.

i

gnore unaligned

word

29Slide30

Bilingual Projection

Baseline 2

: lexicon projection

NOUNPortlandPortland

ADJ

thriving

gedeihende

VERB

has

hat

DET

a

eine

NOUN

scene

Musikszene

NOUN

music

.

.

.

Bag of alignments

30Slide31

Bilingual Projection

Baseline 2

: lexicon projection

NOUNPortlandPortland

ADJ

thriving

gedeihende

VERB

has

hat

eine

NOUN

scene

Musikszene

NOUN

music

.

.

.

DET

a

31Slide32

Bilingual Projection

Baseline 2

:

lexicon projectionNOUNPortlandPortland

ADJ

thriving

gedeihende

VERB

has

hat

eine

NOUN

scene

NOUN

music

.

.

.

DET

a

NUM

one

PRON

one

Musikszene

32Slide33

Bilingual Projection

Baseline 2

:

lexicon projectionNOUNPortland

Portland

ADJ

thriving

gedeihende

VERB

has

hat

eine

NOUN

scene

NOUN

music

.

.

.

DET

a

NUM

one

PRON

one

Musikszene

VERB

thriving

33Slide34

Bilingual Projection

Baseline 2

: lexicon projection

Portlandgedeihendehateine

Musikszene

.

After scanning all the parallel data:

= probability of a tag given a word

34Slide35

Bilingual Projection

Baseline 2

: lexicon projectionFeature HMM constrained with projected

dictionaryImprovements over simple projection for majority of the languages35DanishDutch

German

Greek

Italian

Portuguese

Spanish

Swedish

Average

68.7

57.0

75.9

65.8

63.7

62.9

71.5

68.4

66.7

69.1

65.1

81.3

71.8

68.1

78.4

80.2

70.1

73.0

73.6

77.0

83.279.379.782.6

80.174.7

78.8

79.078.882.476.384.887.082.8

79.481.3

EM-HMM

Direct

projectionProjectedDictionary

Feature-HMMSlide36

Can

coverage be improved?Idea:Projected lexicon expansion and refinement using label propagation

No information about unaligned words36Portland hat eine prächtig gedeihende Musikszene .Portland has a thriving music scene .

NOUNVERBDET

ADJ

NOUN

NOUN

.Slide37

How can label propagation help?

For a language:Build graph over a 2M trigram types as verticescompute similarity matrix using co-occurrence statistics

Label distribution at each vertex tag distribution over the trigram’s middle word 

Subramanya, Petrov and Pereira (2010)

Our Model

:

Graph-Based

Projections

37Slide38

ist

gut beiist lebhafter bei

ist wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

Example Graph in German

38Slide39

ist

gut beiist lebhafter bei

ist wichtig beiist fein bei

gutem

Essen

zugetan

fuers

Essen

drauf

1000

Essen

pro

schlechtes

Essen

und

zum

Essen

niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

Example Graph in German

39

NOUN

VERBSlide40

How can label propagation help?

For a target language:Build graph over a 2M trigram types as vertices

compute similarity matrix using co-occurrence statisticsLabel distribution at each vertex tag distribution over the trigram’s middle word

 Plug in auto-tagged words from a source languageLinks between source and target language units are word alignments40

Our Model

:

Graph-Based

ProjectionsSlide41

ist

gut beiist lebhafter beiist

wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

eat

food

eat

eating

NOUN

VERB

VERB

VERB

good

ADJ

nicely

ADV

fine

ADJ

important

ADJ

Bilingual

Graph

41Slide42

How can label propagation help?

For a target language:Plug in auto-tagged words from a source

languageLinks between source and target language units are word alignmentsRun first stage of label propagationSource language target language

 42

Our Model

:

Graph-Based

ProjectionsSlide43

ist

gut beiist lebhafter beiist

wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

ess

en

,

eat

food

eat

eating

NOUN

VERB

VERB

VERB

good

ADJ

nicely

ADV

fine

ADJ

important

ADJ

First Stage of Label Propagation

43Slide44

ist

gut beiist lebhafter beiist

wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

eat

food

eat

eating

NOUN

VERB

VERB

VERB

good

ADJ

nicely

ADV

fine

ADJ

important

ADJ

First Stage of Label Propagation

44Slide45

How can label propagation help?

For a target language:Plug in auto-tagged words from a source

languageLinks between source and target language units are word alignmentsRun first stage of label propagationSource language target languageRun second stage of label propagation

within target language verticesgraph objective function with squared penalties 45

Our Model

:

Graph-Based

ProjectionsSlide46

ist

gut beiist lebhafter beiist

wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

eat

food

eat

eating

NOUN

VERB

VERB

VERB

good

ADJ

nicely

ADV

fine

ADJ

important

ADJ

Second Stage of Label Propagation

46Slide47

ist

gut beiist lebhafter bei

ist wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

eat

food

eat

eating

NOUN

VERB

VERB

VERB

good

ADJ

nicely

ADV

fine

ADJ

important

ADJ

Second Stage of Label Propagation

47Slide48

ist

gut beiist lebhafter bei

ist wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

eat

food

eat

eating

NOUN

VERB

VERB

VERB

good

ADJ

nicely

ADV

fine

ADJ

important

ADJ

Second Stage of Label Propagation

48Slide49

ist

gut beiist lebhafter bei

ist wichtig beiist fein bei

gutem Essen zugetan

fuers Essen drauf

1000 Essen pro

schlechtes Essen und

zum Essen niederlassen

zu

realisieren

,

zu

erreichen

,

zu

stecken

,

zu

essen

,

eat

food

eat

eating

NOUN

VERB

VERB

VERB

good

ADJ

nicely

ADV

fine

ADJ

important

ADJ

Second Stage of Label Propagation

49

Continues till convergence...Slide50

fein

lebhafterrealisierenPortland

gedeihendehateineMusikszene

.

End result?

50

Our Model

:

Graph-Based

ProjectionsSlide51

fein

lebhafterrealisierenPortland

gedeihendehateineMusikszene

.51

A larger set of tag distributions

better and larger dictionary

 

Our Model

:

Graph-Based

ProjectionsSlide52

52

Lexicon Expansionthousandsof words

Our Model: Graph-Based ProjectionsSlide53

Brief Overview:Graph-Based Learning

with Labeled and Unlabeled Data53Slide54

l

abeled datapoints

unlabeled datapoints

s

upervised label

distributions

d

istributions

to be found

Zhu, Ghahramani

and Lafferty (2003)

54

0.9

0.01

0.8

0.9

0.1

= symmetric weight matrix

0.05Slide55

0.9

0.01

0.8

0.9

0.1

Label Propagation

Zhu, Ghahramani

and

Lafferty (2003)

55

0.05Slide56

0.9

0.01

0.8

0.9

0.1

s

et of distributions over unlabeled vertices

56

Label Propagation

0.05Slide57

0.9

0.01

0.8

0.9

0.1

unlabeled vertices

57

Label Propagation

0.05Slide58

0.9

0.01

0.8

0.9

0.1

b

rings the distributions of similar

v

ertices closer

58

Label Propagation

0.05Slide59

0.9

0.01

0.8

0.9

0.1

b

rings the distributions of

uncertain

neighborhoods

close to the uniform distribution

Size of

the

label set

59

Label Propagation

0.05Slide60

0.9

0.01

0.8

0.9

0.1

Iterative updates for optimization

60

Label Propagation

0.05Slide61

61

Final ResultsSlide62

Feature HMM constrained with graph-based dictionary

DanishDutchGermanGreekItalianPortugueseSpanishSwedish

Average68.757.075.965.863.762.971.568.4

66.769.165.181.371.868.178.480.270.1

73.0

73.6

77.0

83.2

79.3

79.7

82.6

80.1

74.7

78.8

79.0

78.8

82.4

76.3

84.8

87.0

82.8

79.4

81.3

83.2

79.5

82.8

82.5

86.8

87.9

84.2

80.5

83.4

EM-HMM

Feature-HMM

Direct

projection

Projected

Dictionary

Graph-BasedProjections62

Our Model:

Graph-Based ProjectionsSlide63

Feature HMM constrained with graph-based dictionary

DanishDutchGermanGreekItalianPortugueseSpanishSwedish

Average68.757.075.965.863.762.971.568.4

66.769.165.181.371.868.178.480.270.1

73.0

73.6

77.0

83.2

79.3

79.7

82.6

80.1

74.7

78.8

79.0

78.8

82.4

76.3

84.8

87.0

82.8

79.4

81.3

83.2

79.5

82.8

82.5

86.8

87.9

84.2

80.5

83.4

EM-HMM

Feature-HMM

Direct

projection

Projected

Dictionary

Graph-BasedProjections93.1

94.793.596.696.494.095.8

85.593.7w/ gold dictionary96.9

94.998.297.895.897.296.8

94.896.6supervised63

Our Model: Graph-Based

ProjectionsSlide64

Concluding Notes

Reasonably accurate POS taggers without direct supervisionEvaluated on major European languagesTowards a standard of universal POS tagsTraditional evaluation of unsupervised POS taggers done using greedy metrics that use labeled dataOur presented models avoid these evaluation methods64Slide65

Future Directions

Scaling up the number of nodes in the graph from 2M to billions may help create larger lexiconsIncluding penalties in the graph objective that induce sparse tag distributions at each graph vertexInclusion of multiple languages in the graph may further improve resultsLabel propagation in one huge multilingual graph65Slide66

66

http://code.google.com/p/pos-projection/Projected POS Tagged dataavailable at:Slide67

Portland has a thriving music scene .

NOUNADJADJ

Portland hat eine prächtig gedeihende Musikszene .পোর্টল্যান্ড শহর এর সঙ্গীত পরিবেশ বেশ উন্নত | NOUN

VERBDETADJ

NOUN

NOUN

.

Portland tiene una escena musical vibrante .

波特

生机勃勃

Portland a une scène musicale florissante .

ADJ

NOUN

NOUN

NOUN

ADP

.

NOUNVERBDET

ADJ

ADJ

.NOUN

VERBDETADJNOUN

NOUNNOUNVERB

DET

NOUNADJADJ

.

Questions?67