Alona Fyshe Leila Wehbe Partha Talukdar Brian Murphy and Tom Mitchell Carnegie Mellon University amfyshegmailcom 1 2 pear l ettuce orange apple carrots VSMs and Composition ID: 321141
Download Presentation The PPT/PDF document "A Compositional and Interpretable Semant..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Compositional and Interpretable Semantic Space
Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon Universityamfyshe@gmail.com
1Slide2
2
pearlettuce
orange
apple
carrots
VSMs and CompositionSlide3
How to Make a VSM
CountDim.
Reduction
Corpus
Statistics
VSM
3
Many cols
Few colsSlide4
4
pearlettuce
orange
apple
carrots
seedless orange
VSMs and CompositionSlide5
VSMs and Composition
f
( , )
=
adjective
noun
estimate
observed
5
Stats for seedless
Stats for orange
Observed stats for “seedless orange”Slide6
Previous Work
What is “f”?(Mitchell & Lapata, 2010; Baroni and Zamparelli, 2010; Blacoe and Lapata, 2012; Socher et al., 2012; Dinu et al., 2013; Hermann & Blunsom, 2013) Which VSMs are best for composition?
(
Turney
, 2012, 2013
;
Fyshe et al., 2013;
Baroni
et al.,
2014) 6Slide7
Our Contributions
Can we learn a VSM that is aware of composition function?is interpretable?
F
F
Is edible
7Slide8
How to make a VSM
Corpus16 billion words50 million documentsCount dependencies arcs in sentencesMALT dependency parserPoint-wise Positive Mutual Information
8Slide9
Matrix Factorization in VSMs
XA
D
≈
Corpus
Stats (c)
Words
9
VSMSlide10
Interpretability
10A
Latent Dims
WordsSlide11
Interpretability
11SVD (Fyshe 2013)well, long, if, year, watch plan, engine, e, rock, very get, no, features, music, viaWord2vec (pretrained on Google News) pleasantries, draft_picks, chairman_Harley_Hotchkiss
, windstorm,
Vermont_Yankee
Programme_Producers_AMPTPP
, ###/
mt
,
Al_Mehwar
, NCWS, WhereasUbiquitous_Sensor_Networks, KTO, discussing,
Hibernia_Terra_Nova, NASDAQ_ENWVSlide12
Non-Negative Sparse Embeddings
12X
A
D
≈
(Murphy 2012)Slide13
Interpretability
13SVDwell, long, if, year, watch plan, engine, e, rock, very get, no, features, music, via NNSEinhibitor, inhibitors, antagonists, receptors, inhibition bristol, thames, southampton,
brighton
,
poole
delhi
,
india
, bombay, chennai, madras Slide14
A Composition-aware VSM
14Slide15
Modeling Composition
Rows of X are wordsCan also be phrasesX
A
Phrases
Phrases
15
Adjectives
Nouns
Adjectives
NounsSlide16
Modeling Composition
Additional constraint for compositionA
Phrases
Adjectives
w
1
w
2
p
p
= [w
1
w
2
]
16
NounsSlide17
Weighted Addition
17Slide18
Modeling Composition
18Slide19
Modeling Composition
Reformulate loss with square matrix B19A
B
α
β
-1
adj. col.
noun col.
phrase colSlide20
Modeling Composition
20Slide21
Optimization
Online Dictionary Learning Algorithm(Mairal 2010)Solve for D with gradient descentSolve for A with ADMMAlternating Direction Method of Multipliers 21Slide22
Testing Composition
W. addW. NNSECNNSE22
A
w
1
w
2
p
SVD
w
1
w
2
p
A
w
1
w
2
pSlide23
Phrase Estimation
Predict phrase vectorSort test phrases by distance to estimateRank
(r/N*100)
Reciprocal
rank (1/r)
Percent Perfect (
δ
(r==1))
r
23
NSlide24
Phrase Estimation
Chance 50~ 0.05
1%
24Slide25
Interpretable Dimensions
25Slide26
Interpretability
26Slide27
Testing Interpretability
SVDNNSECNNSE27
A
w
1
w
2
p
SVD
w
1
w
2
p
A
w
1
w
2
pSlide28
Interpretability
Select the word that does not belong:crunchygooeyfluffycrispycoltcreamy28Slide29
Interpretability
29Slide30
Phrase Representations
30A
phrase
top scoring
words/phrases
top scoring
dimensionSlide31
Phrase Representations
Choose list of words/phrases most associated with target phrase “digital computers”aesthetic, American music, architectural stylecellphones, laptops, monitorsbothneither31Slide32
Phrase Representation
32Slide33
Testing Phrase Similarity
108 adjective-noun phrase pairsHuman judgments of similarity [1…7]E.g. Important part : significant role (very similar)Northern region : early age (not similar)
33
(Mitchell &
Lapata
2010) Slide34
Correlation of Distances
34
Behavioral Data
Model A
Model BSlide35
Testing Phrase Similarity
35Slide36
Interpretability
36Slide37
Better than Correlation: Interpretability
37http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html
(
behav
sim
score 6.33/7)Slide38
Better than Correlation: Interpretability
38http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html
(
behav
sim
score 5.61/7)Slide39
Summary
Composition awareness improves VSMsCloser to behavioral measure of phrase similarityBetter phrase representationsInterpretable dimensionsHelps to debug composition failures39Slide40
Thanks!
www.cs.cmu.edu/~fmri/papers/naacl2015/amfyshe@gmail.com40