with Bilingual GraphBased Projections June 21 ACL 2011 Slav Petrov Google Research Dipanjan Das Carnegie Mellon University PartofSpeech Tagging Portland has a thriving music scene ID: 587082
Download Presentation The PPT/PDF document "Unsupervised Part-of-Speech Tagging" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Unsupervised Part-of-Speech Taggingwith Bilingual Graph-Based Projections
June 21
ACL 2011
Slav PetrovGoogle Research
Dipanjan Das
Carnegie Mellon UniversitySlide2
Part-of-Speech Tagging
Portland has a thriving music scene .NOUNVERB
DETADJNOUNNOUN
.2Slide3
Supervised
POS Tagging
Supervised setting: average accuracy is 96.2% with TnT3 (Brants, 2000)Slide4
Resource-Poor Languages
Several major languages with no or little annotated data
Oriya
Indonesian-Malay
Azerbaijani
e.g.
See http://www.ethnologue.org/ethno_docs/distribution.asp?by=size
Haitian
However,
lots
of parallel and unannotated data!
Basic NLP tools like POS tagging essential for development of
language technologies
4
Punjabi
Vietnamese
Polish
32 million
37 million
20 million
Native speakers
7.7 million
109
million
69
million
40 millionSlide5
(Nearly) Universal Part-of-Speech Tags
VERBDETNOUN
CONJPRONNUMADJ
PRTADV.
ADP
X
5Slide6
(Nearly) Universal Part-of-Speech Tags
Example Penn Treebank tag maps:
NN NOUNNNP NOUNNNPS NOUNNNS NOUN PRP
PRONPRP$ PRONWP PRONWP$ PRON
n
p
NOUN
nc
NOUN
Example Spanish Treebank tag maps:
p
0
PRON
pd
PRON
pe
PRON
pi
PRON
pn
PRON pp PRONpr
PRON
pt PRON
px PRON See Petrov, Das and McDonald (2011)Slide7
(Nearly) Universal Part-of-Speech Tags
Portland has a thriving music scene .
NOUNVERBDET
ADJNOUNNOUN
.
Portland hat eine prächtig gedeihende Musikszene .
NOUN
VERB
DET
ADJ
ADJ
NOUN
.
পোর্টল্যান্ড
শহর
এর
সঙ্গীত
পরিবেশ
বেশ
উন্নত |
NOUN
NOUN
ADP
NOUN
NOUN
ADJ
ADJ
.
7Slide8
State of the Art in
Unsupervised POS Tagging8Slide9
Unsupervised Part-of-Speech Tagging
Portlandhateineprächtig
gedeihendeMusikszene.?
????
?
?
Hidden Markov Model (HMM)
estimated with the Expectation-Maximization algorithm
: observation sequence
: state sequence
9
Merialdo
(1994)Slide10
Unsupervised Part-of-Speech Tagging
Portlandhateineprächtig
gedeihendeMusikszene.?
????
?
?
Hidden Markov Model (HMM)
estimated with the Expectation-Maximization algorithm
one of the 12 coarse tags
: observation sequence
: state sequence
10
Merialdo
(1994)Slide11
Unsupervised Part-of-Speech Tagging
Portlandhat
??
Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithmtransition multinomials
: observation sequence
: state sequence
11
Merialdo
(1994)Slide12
Unsupervised Part-of-Speech Tagging
Portlandhat?
?
Hidden Markov Model (HMM) estimated with the Expectation-Maximization algorithmemission multinomials
: observation sequence
: state sequence
12
Merialdo
(1994)Slide13
Unsupervised Part-of-Speech Tagging
Portlandhateineprächtig
gedeihendeMusikszene.?
????
?
?
Hidden Markov Model (HMM)
estimated with the Expectation-Maximization algorithm
Danish
Dutch
German
Greek
Italian
Portuguese
Spanish
Swedish
Average
68.7
57.0
75.9
65.8
63.7
62.9
71.5
68.4
66.7
EM-HMM
Poor average result
13
Johnson (2007)Slide14
Unsupervised Part-of-Speech Tagging
Hidden Markov Model (HMM) with locally normalized log-linear models
: observation sequence
Portlandhat??
emission multinomials
: state sequence
14
Berg-Kirkpatrick et al. (2010)Slide15
Unsupervised Part-of-Speech Tagging
Hidden Markov Model (HMM) with locally normalized log-linear models
: observation sequence
Portlandhat
?
?
emission multinomials
suffix
hyphen
capital letters
numbers
...
: state sequence
15
Berg-Kirkpatrick et al. (2010)Slide16
Unsupervised Part-of-Speech Tagging
Hidden Markov Model (HMM) with locally normalized log-linear models
: observation sequence
Portlandhat
?
?
emission multinomials
suffix
hyphen
capital letters
numbers
...
Estimated using gradient-based methods
: state sequence
16
Berg-Kirkpatrick et al. (2010)Slide17
Unsupervised Part-of-Speech Tagging
Hidden Markov Model (HMM) with locally normalized log-linear modelsPortlandhat
??
emission multinomials
Danish
Dutch
German
Greek
Italian
Portuguese
Spanish
Swedish
Average
68.7
57.0
75.9
65.8
63.7
62.9
71.5
68.4
66.7
69.1
65.1
81.3
71.8
68.1
78.4
80.270.1
73.0
EM-HMM
Feature-HMM
Estimated using gradient-based methods
Improvements across all languages
17Slide18
Portland
hateineprächtig
gedeihendeMusikszene.NOUN
VERBPRONDETADJNUMADJADVADJ
NOUN
.
Unsupervised POS Tagging with
Dictionaries
Hidden Markov Model (HMM)
with locally normalized log-linear models
State space constrained by possible gold tags
18Slide19
Portland
hateineprächtig
gedeihendeMusikszene.NOUN
VERBPRONDETADJNUMADJADVADJ
NOUN
.
Unsupervised POS Tagging with
Dictionaries
Hidden Markov Model (HMM)
with locally normalized log-linear models
State space constrained by possible gold tags
Danish
Dutch
German
Greek
Italian
Portuguese
Spanish
Swedish
Average
68.7
57.0
75.9
65.8
63.7
62.9
71.5
68.4
66.7
69.1
65.1
81.3
71.8
68.1
78.4
80.2
70.1
73.0
93.1
94.7
93.596.696.4
94.095.885.593.7
EM-HMM
Feature-HMM
w/ gold dictionary19Slide20
For
most languages, access to high-qualitytag dictionaries is not realistic.Ideas: Use supervision in resource-rich languages
Use parallel dataConstruct projected tag lexicons20Morphologically rich languages only have base forms in dictionariesSlide21
Bilingual Projection
Portland has a thriving music scene .NOUNVERB
DETADJNOUNNOUN
.automatic labels from supervised tagger, 97% accuracy21Slide22
Bilingual Projection
Portland has a thriving music scene .NOUNVERB
DETADJNOUNNOUN
.Portland hat eine prächtig gedeihende Musikszene .
Automatic unsupervised alignments from translation data
(available for more than 50 languages)
22Slide23
Bilingual Projection
Portland has a thriving music scene .NOUNVERB
DETADJNOUNNOUN
.Portland hat eine prächtig gedeihende Musikszene .
Baseline1
:
direct projection
unaligned word
NOUN
(most frequent tag)
23
Yarowsky and Ngai (2001)Slide24
Bilingual Projection
Portland hat eine prächtig gedeihende Musikszene .
NOUNVERBDETNOUN
ADJNOUN.
+
more projected tagged sentences
supervised training
tagger
24
(Brants, 2000)
Yarowsky and Ngai (2001)
Baseline1
:
direct projectionSlide25
Bilingual Projection
Baseline 1:
direct projection25DanishDutchGermanGreekItalianPortugueseSpanishSwedish
Average68.757.075.965.863.762.971.5
68.4
66.7
69.1
65.1
81.3
71.8
68.1
78.4
80.2
70.1
73.0
73.6
77.0
83.2
79.3
79.7
82.6
80.1
74.7
78.8
EM-HMM
Direct
projection
Feature-HMM
Yarowsky and Ngai (2001)Slide26
Bilingual Projection
Baseline 1:
direct projection26Yarowsky and Ngai (2001)
consistent improvements over unsupervised modelsDanishDutchGermanGreek
Italian
Portuguese
Spanish
Swedish
Average
68.7
57.0
75.9
65.8
63.7
62.9
71.5
68.4
66.7
69.1
65.1
81.3
71.8
68.1
78.4
80.2
70.1
73.0
73.6
77.0
83.2
79.3
79.782.680.174.7
78.8
EM-HMM
Direct
projection
Feature-HMMSlide27
Bilingual Projection
Baseline 2
: lexicon projection27Slide28
Bilingual Projection
Baseline 2
: lexicon projection
NOUNPortlandVERBhas
DET
a
ADJ
thriving
NOUN
music
NOUN
scene
.
.
Portland
hat
eine
prächtig
gedeihende
Musikszene
.
28Slide29
Bilingual Projection
Baseline 2
: lexicon projection
NOUNPortlandPortland
ADJ
thriving
gedeihende
prächtig
VERB
has
hat
DET
a
eine
NOUN
scene
Musikszene
NOUN
music
.
.
.
i
gnore unaligned
word
29Slide30
Bilingual Projection
Baseline 2
: lexicon projection
NOUNPortlandPortland
ADJ
thriving
gedeihende
VERB
has
hat
DET
a
eine
NOUN
scene
Musikszene
NOUN
music
.
.
.
Bag of alignments
30Slide31
Bilingual Projection
Baseline 2
: lexicon projection
NOUNPortlandPortland
ADJ
thriving
gedeihende
VERB
has
hat
eine
NOUN
scene
Musikszene
NOUN
music
.
.
.
DET
a
31Slide32
Bilingual Projection
Baseline 2
:
lexicon projectionNOUNPortlandPortland
ADJ
thriving
gedeihende
VERB
has
hat
eine
NOUN
scene
NOUN
music
.
.
.
DET
a
NUM
one
PRON
one
Musikszene
32Slide33
Bilingual Projection
Baseline 2
:
lexicon projectionNOUNPortland
Portland
ADJ
thriving
gedeihende
VERB
has
hat
eine
NOUN
scene
NOUN
music
.
.
.
DET
a
NUM
one
PRON
one
Musikszene
VERB
thriving
33Slide34
Bilingual Projection
Baseline 2
: lexicon projection
Portlandgedeihendehateine
Musikszene
.
After scanning all the parallel data:
= probability of a tag given a word
34Slide35
Bilingual Projection
Baseline 2
: lexicon projectionFeature HMM constrained with projected
dictionaryImprovements over simple projection for majority of the languages35DanishDutch
German
Greek
Italian
Portuguese
Spanish
Swedish
Average
68.7
57.0
75.9
65.8
63.7
62.9
71.5
68.4
66.7
69.1
65.1
81.3
71.8
68.1
78.4
80.2
70.1
73.0
73.6
77.0
83.279.379.782.6
80.174.7
78.8
79.078.882.476.384.887.082.8
79.481.3
EM-HMM
Direct
projectionProjectedDictionary
Feature-HMMSlide36
Can
coverage be improved?Idea:Projected lexicon expansion and refinement using label propagation
No information about unaligned words36Portland hat eine prächtig gedeihende Musikszene .Portland has a thriving music scene .
NOUNVERBDET
ADJ
NOUN
NOUN
.Slide37
How can label propagation help?
For a language:Build graph over a 2M trigram types as verticescompute similarity matrix using co-occurrence statistics
Label distribution at each vertex tag distribution over the trigram’s middle word
Subramanya, Petrov and Pereira (2010)
Our Model
:
Graph-Based
Projections
37Slide38
ist
gut beiist lebhafter bei
ist wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
Example Graph in German
38Slide39
ist
gut beiist lebhafter bei
ist wichtig beiist fein bei
gutem
Essen
zugetan
fuers
Essen
drauf
1000
Essen
pro
schlechtes
Essen
und
zum
Essen
niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
Example Graph in German
39
NOUN
VERBSlide40
How can label propagation help?
For a target language:Build graph over a 2M trigram types as vertices
compute similarity matrix using co-occurrence statisticsLabel distribution at each vertex tag distribution over the trigram’s middle word
Plug in auto-tagged words from a source languageLinks between source and target language units are word alignments40
Our Model
:
Graph-Based
ProjectionsSlide41
ist
gut beiist lebhafter beiist
wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
eat
food
eat
eating
NOUN
VERB
VERB
VERB
good
ADJ
nicely
ADV
fine
ADJ
important
ADJ
Bilingual
Graph
41Slide42
How can label propagation help?
For a target language:Plug in auto-tagged words from a source
languageLinks between source and target language units are word alignmentsRun first stage of label propagationSource language target language
42
Our Model
:
Graph-Based
ProjectionsSlide43
ist
gut beiist lebhafter beiist
wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
ess
en
,
eat
food
eat
eating
NOUN
VERB
VERB
VERB
good
ADJ
nicely
ADV
fine
ADJ
important
ADJ
First Stage of Label Propagation
43Slide44
ist
gut beiist lebhafter beiist
wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
eat
food
eat
eating
NOUN
VERB
VERB
VERB
good
ADJ
nicely
ADV
fine
ADJ
important
ADJ
First Stage of Label Propagation
44Slide45
How can label propagation help?
For a target language:Plug in auto-tagged words from a source
languageLinks between source and target language units are word alignmentsRun first stage of label propagationSource language target languageRun second stage of label propagation
within target language verticesgraph objective function with squared penalties 45
Our Model
:
Graph-Based
ProjectionsSlide46
ist
gut beiist lebhafter beiist
wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
eat
food
eat
eating
NOUN
VERB
VERB
VERB
good
ADJ
nicely
ADV
fine
ADJ
important
ADJ
Second Stage of Label Propagation
46Slide47
ist
gut beiist lebhafter bei
ist wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
eat
food
eat
eating
NOUN
VERB
VERB
VERB
good
ADJ
nicely
ADV
fine
ADJ
important
ADJ
Second Stage of Label Propagation
47Slide48
ist
gut beiist lebhafter bei
ist wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
eat
food
eat
eating
NOUN
VERB
VERB
VERB
good
ADJ
nicely
ADV
fine
ADJ
important
ADJ
Second Stage of Label Propagation
48Slide49
ist
gut beiist lebhafter bei
ist wichtig beiist fein bei
gutem Essen zugetan
fuers Essen drauf
1000 Essen pro
schlechtes Essen und
zum Essen niederlassen
zu
realisieren
,
zu
erreichen
,
zu
stecken
,
zu
essen
,
eat
food
eat
eating
NOUN
VERB
VERB
VERB
good
ADJ
nicely
ADV
fine
ADJ
important
ADJ
Second Stage of Label Propagation
49
Continues till convergence...Slide50
fein
lebhafterrealisierenPortland
gedeihendehateineMusikszene
.
End result?
50
Our Model
:
Graph-Based
ProjectionsSlide51
fein
lebhafterrealisierenPortland
gedeihendehateineMusikszene
.51
A larger set of tag distributions
better and larger dictionary
Our Model
:
Graph-Based
ProjectionsSlide52
52
Lexicon Expansionthousandsof words
Our Model: Graph-Based ProjectionsSlide53
Brief Overview:Graph-Based Learning
with Labeled and Unlabeled Data53Slide54
l
abeled datapoints
unlabeled datapoints
s
upervised label
distributions
d
istributions
to be found
Zhu, Ghahramani
and Lafferty (2003)
54
0.9
0.01
0.8
0.9
0.1
= symmetric weight matrix
0.05Slide55
0.9
0.01
0.8
0.9
0.1
Label Propagation
Zhu, Ghahramani
and
Lafferty (2003)
55
0.05Slide56
0.9
0.01
0.8
0.9
0.1
s
et of distributions over unlabeled vertices
56
Label Propagation
0.05Slide57
0.9
0.01
0.8
0.9
0.1
unlabeled vertices
57
Label Propagation
0.05Slide58
0.9
0.01
0.8
0.9
0.1
b
rings the distributions of similar
v
ertices closer
58
Label Propagation
0.05Slide59
0.9
0.01
0.8
0.9
0.1
b
rings the distributions of
uncertain
neighborhoods
close to the uniform distribution
Size of
the
label set
59
Label Propagation
0.05Slide60
0.9
0.01
0.8
0.9
0.1
Iterative updates for optimization
60
Label Propagation
0.05Slide61
61
Final ResultsSlide62
Feature HMM constrained with graph-based dictionary
DanishDutchGermanGreekItalianPortugueseSpanishSwedish
Average68.757.075.965.863.762.971.568.4
66.769.165.181.371.868.178.480.270.1
73.0
73.6
77.0
83.2
79.3
79.7
82.6
80.1
74.7
78.8
79.0
78.8
82.4
76.3
84.8
87.0
82.8
79.4
81.3
83.2
79.5
82.8
82.5
86.8
87.9
84.2
80.5
83.4
EM-HMM
Feature-HMM
Direct
projection
Projected
Dictionary
Graph-BasedProjections62
Our Model:
Graph-Based ProjectionsSlide63
Feature HMM constrained with graph-based dictionary
DanishDutchGermanGreekItalianPortugueseSpanishSwedish
Average68.757.075.965.863.762.971.568.4
66.769.165.181.371.868.178.480.270.1
73.0
73.6
77.0
83.2
79.3
79.7
82.6
80.1
74.7
78.8
79.0
78.8
82.4
76.3
84.8
87.0
82.8
79.4
81.3
83.2
79.5
82.8
82.5
86.8
87.9
84.2
80.5
83.4
EM-HMM
Feature-HMM
Direct
projection
Projected
Dictionary
Graph-BasedProjections93.1
94.793.596.696.494.095.8
85.593.7w/ gold dictionary96.9
94.998.297.895.897.296.8
94.896.6supervised63
Our Model: Graph-Based
ProjectionsSlide64
Concluding Notes
Reasonably accurate POS taggers without direct supervisionEvaluated on major European languagesTowards a standard of universal POS tagsTraditional evaluation of unsupervised POS taggers done using greedy metrics that use labeled dataOur presented models avoid these evaluation methods64Slide65
Future Directions
Scaling up the number of nodes in the graph from 2M to billions may help create larger lexiconsIncluding penalties in the graph objective that induce sparse tag distributions at each graph vertexInclusion of multiple languages in the graph may further improve resultsLabel propagation in one huge multilingual graph65Slide66
66
http://code.google.com/p/pos-projection/Projected POS Tagged dataavailable at:Slide67
Portland has a thriving music scene .
NOUNADJADJ
Portland hat eine prächtig gedeihende Musikszene .পোর্টল্যান্ড শহর এর সঙ্গীত পরিবেশ বেশ উন্নত | NOUN
VERBDETADJ
NOUN
NOUN
.
Portland tiene una escena musical vibrante .
波特
兰
有
一
个
生机勃勃
的
音
乐
场
景
Portland a une scène musicale florissante .
ADJ
NOUN
NOUN
NOUN
ADP
.
NOUNVERBDET
ADJ
ADJ
.NOUN
VERBDETADJNOUN
NOUNNOUNVERB
DET
NOUNADJADJ
.
Questions?67