/
A Compositional and Interpretable Semantic Space A Compositional and Interpretable Semantic Space

A Compositional and Interpretable Semantic Space - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
404 views
Uploaded On 2016-05-15

A Compositional and Interpretable Semantic Space - PPT Presentation

Alona Fyshe Leila Wehbe Partha Talukdar Brian Murphy and Tom Mitchell Carnegie Mellon University amfyshegmailcom 1 2 pear l ettuce orange apple carrots VSMs and Composition ID: 321141

phrase composition phrases interpretability composition phrase interpretability phrases vsm lapata modeling 2013 vsms mitchell 2010 stats 2012 svd testing

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Compositional and Interpretable Semant..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A Compositional and Interpretable Semantic Space

Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon Universityamfyshe@gmail.com

1Slide2

2

pearlettuce

orange

apple

carrots

VSMs and CompositionSlide3

How to Make a VSM

CountDim.

Reduction

Corpus

Statistics

VSM

3

Many cols

Few colsSlide4

4

pearlettuce

orange

apple

carrots

seedless orange

VSMs and CompositionSlide5

VSMs and Composition

f

( , )

=

adjective

noun

estimate

observed

5

Stats for seedless

Stats for orange

Observed stats for “seedless orange”Slide6

Previous Work

What is “f”?(Mitchell & Lapata, 2010; Baroni and Zamparelli, 2010; Blacoe and Lapata, 2012; Socher et al., 2012; Dinu et al., 2013; Hermann & Blunsom, 2013) Which VSMs are best for composition?

(

Turney

, 2012, 2013

;

Fyshe et al., 2013;

Baroni

et al.,

2014) 6Slide7

Our Contributions

Can we learn a VSM that is aware of composition function?is interpretable?

F

F

Is edible

7Slide8

How to make a VSM

Corpus16 billion words50 million documentsCount dependencies arcs in sentencesMALT dependency parserPoint-wise Positive Mutual Information

8Slide9

Matrix Factorization in VSMs

XA

D

Corpus

Stats (c)

Words

9

VSMSlide10

Interpretability

10A

Latent Dims

WordsSlide11

Interpretability

11SVD (Fyshe 2013)well, long, if, year, watch plan, engine, e, rock, very get, no, features, music, viaWord2vec (pretrained on Google News) pleasantries, draft_picks, chairman_Harley_Hotchkiss

, windstorm,

Vermont_Yankee

Programme_Producers_AMPTPP

, ###/

mt

,

Al_Mehwar

, NCWS, WhereasUbiquitous_Sensor_Networks, KTO, discussing,

Hibernia_Terra_Nova, NASDAQ_ENWVSlide12

Non-Negative Sparse Embeddings

12X

A

D

(Murphy 2012)Slide13

Interpretability

13SVDwell, long, if, year, watch plan, engine, e, rock, very get, no, features, music, via NNSEinhibitor, inhibitors, antagonists, receptors, inhibition bristol, thames, southampton,

brighton

,

poole

delhi

,

india

, bombay, chennai, madras Slide14

A Composition-aware VSM

14Slide15

Modeling Composition

Rows of X are wordsCan also be phrasesX

A

Phrases

Phrases

15

Adjectives

Nouns

Adjectives

NounsSlide16

Modeling Composition

Additional constraint for compositionA

Phrases

Adjectives

w

1

w

2

p

p

= [w

1

w

2

]

16

NounsSlide17

Weighted Addition

17Slide18

Modeling Composition

18Slide19

Modeling Composition

Reformulate loss with square matrix B19A

B

α

β

-1

adj. col.

noun col.

phrase colSlide20

Modeling Composition

20Slide21

Optimization

Online Dictionary Learning Algorithm(Mairal 2010)Solve for D with gradient descentSolve for A with ADMMAlternating Direction Method of Multipliers 21Slide22

Testing Composition

W. addW. NNSECNNSE22

A

w

1

w

2

p

SVD

w

1

w

2

p

A

w

1

w

2

pSlide23

Phrase Estimation

Predict phrase vectorSort test phrases by distance to estimateRank

(r/N*100)

Reciprocal

rank (1/r)

Percent Perfect (

δ

(r==1))

r

23

NSlide24

Phrase Estimation

Chance 50~ 0.05

1%

24Slide25

Interpretable Dimensions

25Slide26

Interpretability

26Slide27

Testing Interpretability

SVDNNSECNNSE27

A

w

1

w

2

p

SVD

w

1

w

2

p

A

w

1

w

2

pSlide28

Interpretability

Select the word that does not belong:crunchygooeyfluffycrispycoltcreamy28Slide29

Interpretability

29Slide30

Phrase Representations

30A

phrase

top scoring

words/phrases

top scoring

dimensionSlide31

Phrase Representations

Choose list of words/phrases most associated with target phrase “digital computers”aesthetic, American music, architectural stylecellphones, laptops, monitorsbothneither31Slide32

Phrase Representation

32Slide33

Testing Phrase Similarity

108 adjective-noun phrase pairsHuman judgments of similarity [1…7]E.g. Important part : significant role (very similar)Northern region : early age (not similar)

33

(Mitchell &

Lapata

2010) Slide34

Correlation of Distances

34

Behavioral Data

Model A

Model BSlide35

Testing Phrase Similarity

35Slide36

Interpretability

36Slide37

Better than Correlation: Interpretability

37http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html

(

behav

sim

score 6.33/7)Slide38

Better than Correlation: Interpretability

38http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html

(

behav

sim

score 5.61/7)Slide39

Summary

Composition awareness improves VSMsCloser to behavioral measure of phrase similarityBetter phrase representationsInterpretable dimensionsHelps to debug composition failures39Slide40

Thanks!

www.cs.cmu.edu/~fmri/papers/naacl2015/amfyshe@gmail.com40