Thang Luong Joint work with Richard Socher and Christopher D Manning Word frequencies in Wikipedia documents 986 million tokens And more indistinctly nondistinct indistinctive nondistinctive indistinctness ID: 528105
Download Presentation The PPT/PDF document "Better Word Representations with Recursi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Better Word Representations with Recursive Neural Networks for Morphology
Thang
Luong
Joint work with
Richard
Socher
and
Christopher D. ManningSlide2
Word frequencies in Wikipedia documents
(986 million tokens)Slide3
And more … indistinctly
, non-distinct, indistinctive, non-distinctive, indistinctness,
distincted
, semi-distinct, contra-distinction, indistinction, contradistinctiveProblem: these words are independent entities!
Word frequencies in Wikipedia documents
(986 million tokens)Slide4
(
Collobert
& Weston, 2010)
(Huang et. al., 2012)
distinct
different distinctive broader narrower
unique broad distinctive separate
distinctness
morphologies pesawat clefts pathologiescompanion roskam hitoshi enjoyedaffectexacerbate impacts characterize outweigh allow prevent involve enable unaffectedunnoticed dwarfed mitigated overwhelmed monti sheaths krystal south-southeast
Very successful in recent years for NLP tasks
Vector-space word representationsSlide5
Vector-space word representations
(
Collobert
& Weston, 2010)
(Huang et. al., 2012)
distinct
different distinctive broader narrower
unique broad distinctive separate
distinctnessmorphologies pesawat clefts pathologiescompanion roskam hitoshi enjoyedaffectexacerbate impacts characterize outweigh allow prevent involve enable unaffectedunnoticed dwarfed mitigated overwhelmed monti sheaths krystal south-southeast
Very successful in recent years for NLP tasks
Problem
: poorly estimate rare and complex wordsSlide6
Very successful in recent years for NLP
tasks
Problem
: poorly estimate rare and complex words
(
Collobert
& Weston, 2010)
(Huang et. al., 2012)
This workdistinctdifferent distinctive broader narrowerunique broad distinctive separatedivergent diverse distinctive homogeneousdistinctnessmorphologies pesawat clefts pathologiescompanion roskam hitoshi enjoyeddistinctiveness smallness largeness exactnessaffectexacerbate impacts characterize outweigh allow prevent involve enable decrease arise complicate extend
unaffectedunnoticed dwarfed mitigated overwhelmed
monti
sheaths
krystal
south-southeast
disaffected undisputed unopposed unrestricted
Goal
: capture both syntactics (word structure) and semantics
Vector-space word representationsSlide7
Our approach
Neural Language Model
Morphology Model
Compute vector representations for complex words on the fly.
unfortunately
the
bank
was
closed
Semantics
SyntacticsSlide8
Neural Language Model
Morphology Model
Our approach –
network structure
unfortunately
the
bank
was
closed
un
fortunate
unfortunate
ly
close
d
Neural Language Model
: simple feed-forward network (Huang, et al., 2012) with ranking-type cost (
Collobert
et al., 2011).
Morphology Model
:
r
ecursive
neural
network
(
Socher
et al., 2011
).Slide9
Unsupervised Morphological Structures
Our
morphoRNN
assumes words pre
*
stm
suf*
:
Morphological segmentations (pre* stm suf*)+ provided by Morfessor (Creutz & Lagus, ’07).Post-process words, including hyphenated ones.Learn meanings of unconventional morphemes:alpre in Arabic names: al-jazeera, al-salemrelatedsuf in compound adjectives: health-related, government-relatedSlide10
Experiments: Word Similarity Task
Word similarity dataset:
k
ing – queen: 8.58 king – cabbage: 0.23Datasets: WS-353
(Finkelstein et al., ’02
) MC
(Miller & Charles, ’91)RG (Rubenstein & Goodenough, ’65) SCWS* (Huang et al., ’12)
A
new rare word (RW) dataset:Metric: correlation between similarity scores given by our models and those assigned by human raters. word1untrackedunflaggingunprecedentedapocalypticalorganismaldiagonalobtainmentdiscernmentconfinementword2
inaccessibleconstant
new
prophetic
system
line
acquiring
knowing
restraintSlide11
Results
Start training from either:
HSMN embedding (Huang et al., 2012
).C&W embedding (Collobert et al., 2011).Slide12
Results
Start training from either:
HSMN embedding (Huang et al., 2012
).C&W embedding (Collobert et al., 2011).Slide13
Conclusions
Learned syntactic-semantic word vectors by combining:
RNN: models morphological structures of words.
NLM: learns semantics from the surrounding contexts.Introduced a new dataset of rare words.Future: apply to morphologically complex languages or in other English domains such as bio-NLP.
Words
(
Collobert
et
al., 2011)This workcommentinginsisting insisted focusing hintedcommented interviewing commentsunaffectedunnoticed dwarfed mitigateddisaffected undisputed unopposedheartlessnesscorruptive inhumanity ineffectual overawed saudi-ownedavatar mohajir kripalanisaudi-based syrian-controlled syrian-backedSlide14
Outline
More details (context-
sensitive
model)Context-insensitive model (first thing we tried)Rare word dataset
More results & AnalysisSlide15
Context-sensitive model
Feed-forward network:
Recursive neural network (RNN): parameter sharing
Objective: ranking-type cost
s(cat chills on a
mat
) > s(cat chills on a Sofia)Learning: back-propagation
unfortunately
the
bank
was
closed
un
fortunate
unfortunate
ly
close
d
Slide16
Context-insensitive model
Objective: squared Euclidean distances
newly constructed vs. reference representations
Problem
: strongly bias towards syntactic agreement.
Similar to
(Lazaridou et al., ACL’13): compositional distributional semantic models
No recursive composition
: only an affix and a stemunfortunately
unfortunately
un
fortunate
unfortunate
ly
Slide17
Rare Word Datasets
Datasets:
WS-353
(Finkelstein et al., ’02) MC (Miller & Charles, ’91)RG (Rubenstein &
Goodenough
,
’65) SCWS* (Huang et al., ’12) A new rare word (RW) datasetSlide18
Rare Word Dataset Construction
Select rare words
: grouped by
affixes and frequenciesEach word has a non-zero number of synsets in
WordNet
Form
pairs
: for
each rare word, first select a synsetthen select hypernyms, hyponyms, meronyms, and attributesCollect human judgments: Amazon Mechanical Turk3145 pairs rated by 10 people (US native speakers), 2034 pairs accepted.[6, 10][11, 100][101, 1000]un-untracked unrolls undissolved
unrehearsed unflagging unfavourable
unprecedented
unmarried
u
ncomfortable
-alapocalyptical traversals bestowalsacoustical extensional organismal
directional diagonal spherical-mentobtainment acquirement retrenchments
discernment revetment rearrangements
confinement establishment management
word1
untracked
unflagging
unprecedented
apocalyptical
organismal
diagonal
obtainment
discernment
confinement
word2
inaccessible
constant
new
prophetic
system
line
acquiring
knowing
restraintSlide19
ResultsSlide20
Analysis
Context-
insensitive
model: well enforces structural agreement, but ignore semantics.Context-sensitive
model: blends well syntactic (word structure) and semantic information.
Words
C&W
C&W + context-insensitive
C&W + context-sensitivecommentinginsisting insisted focusing hintedrepublishing accounting expoundingcommented interviewing commentsaffectedcaused plagued impacted damageddisaffected unaffected mitigated disturbedaffecting extended extending constrainedunaffectedunnoticed dwarfed mitigateddisaffected unconstrained uninhibiteddisaffected undisputed unopposedheartlessnessfearlessness vindictiveness restlessnesscorruptive inhumanity ineffectual overawed saudi-owned
avatar mohajir kripalanisaudi
-based
somaliland
al-
jaber
saudi
-based syrian-controlled syrian-backedSlide21
Possible extensions
Bio-NLP domain:
complicated but logical taxonomy
alpha-adrenergic, alpha-mpt, alpha-mpt-treated
dihydroxyphenylaline
,
dihydroxyphenylserineExtend the model from pre*
stm
suf* to (pre* stm suf*)+ .Jointly learn morpheme vectors and segmentations:Bad segmentations: disc+over, under+standPerhaps, a fast version of the split-merge heuristic in grammatical induction?Thank you!Slide22
Bilingual Word Representations
Objective: for each alignment, sum
LM
English: score(unfortunately the
bank
was
closed)LMFrench: score(malheureusement la banque a été
fermé
)Alignment:Weigh alignment constraints by alignment probabilities.Assume alignments are perfect for now? unfortunately the bank was closed malheureusement la banque a été ferméSlide23
Related Work
Factored
NLM (
Alexandrescu & Kirchhoff, HLT’06):each word: a vector of features (factors) such as stems, morphological
tags, and cases.
Concatenate vectors of factors,
no composition.Compositional distributional semantic models (Lazaridou et al., ACL’13): Similar to our context-insensitive model
No recursive composition
: only an affix and a stemSlide24
Unsupervised Morphological Structures
Utilize morphological segmentation toolkit:
Morfessor
(Creutz
&
Lagus
, 2007): (pre*
stm suf*)+Recursively split words: MDL-inspired objectiveWant input of the form pre* stm suf*:(1) Restrict segmentations pre* stm{1,2} suf*(2) Split hyphenated words A-B as Astm Bstm(3) Decide the main stem in
pre* Astm
B
stm
suf
*
Discover more interesting morphemes:alpre in Arabic names: al-jazeera, al-salem
related
suf
in compound adjectives: health-related, government-related
(4) Reject segmentation if
an affix or an unknown stem (not a word by itself) whose type count is below a predefined threshold.