Sparse and Explicit Word Representations Omer Levy Yoav Goldberg Bar Ilan University Israel Papers in ACL 2014 Sampling error 100 Neural Embeddings Dense vectors Each dimension is a ID: 695820
Download Presentation The PPT/PDF document "Linguistic Regularities in" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Linguistic Regularities in Sparse and Explicit Word Representations
Omer Levy Yoav GoldbergBar-Ilan UniversityIsraelSlide2
Papers in ACL 2014*
* Sampling error: +/- 100%Slide3
Neural Embeddings
Dense vectorsEach dimension is a latent featureCommon software package: word2vec
“Magic”
king
man
woman
queen(analogies)
Slide4
Representing words as vectors is not new!Slide5
Explicit Representations (Distributional)
Sparse vectorsEach dimension is an explicit contextCommon association metric: PMI, PPMI
Does the same “magic” work for explicit representations too?
Baroni
et al. (2014) showed that embeddings outperform explicit, but…
Slide6
QuestionsAre analogies unique to neural embeddings?
Compare neural embeddings with explicit representationsWhy does vector arithmetic reveal analogies?Unravel the mystery behind neural embeddings and their “magic”Slide7
BackgroundSlide8
Mikolov et al. (2013a,b,c)Neural embeddings have interesting geometriesSlide9Slide10
Mikolov et al. (2013a,b,c)Neural embeddings have interesting geometries
These patterns capture “relational similarities”Can be used to solve analogies:man is to
woman
as
king
is to
queenSlide11
Mikolov et al. (2013a,b,c)
Neural embeddings have interesting geometriesThese patterns capture “relational similarities”Can be used to solve analogies:
is to
as
is to Can be recovered by “simple” vector arithmetic:
Slide12
Mikolov et al. (2013a,b,c)
Neural embeddings have interesting geometriesThese patterns capture “relational similarities”Can be used to solve analogies:
is to
as
is to With simple vector arithmetic:
Slide13
Mikolov
et al. (2013a,b,c)Slide14
Mikolov
et al. (2013a,b,c)Slide15
king
man
woman
queen
Mikolov
et al. (2013a,b,c)
Slide16
Tokyo
Japan
France
Paris
Mikolov
et al. (2013a,b,c)
Slide17
best
good
strong
strongest
Mikolov
et al. (2013a,b,c)
Slide18
best
good
strong
strongest
Mikolov
et al. (2013a,b,c)
vectors in
Slide19
Are analogies unique to neural embeddings?Slide20
Experiment: compare embeddings to explicit
representations
Are
analogies unique
to
neural embeddings
?Slide21
Are analogies unique to neural embeddings?Experiment: compare embeddings to
explicit representationsSlide22
Are analogies unique to neural embeddings?Experiment: compare embeddings to
explicit representationsLearn different representations from the same
corpus:Slide23
Are analogies unique to neural embeddings?
Experiment: compare embeddings to explicit representationsLearn different representations from the same
corpus:
Evaluate with the same
recovery method:
Slide24
Analogy Datasets
4 words per analogy: is to
as
is to
Given 3 words:
is to as is to Guess the best suiting from the entire vocabulary Excluding the question words
MSR:
8000 syntactic analogies
Google:
19,000 syntactic and semantic analogies
Slide25
Embedding vs Explicit (Round 1)Slide26
Embedding vs Explicit (Round 1)
Many analogies recovered by explicit
, but many more by
embedding
.Slide27
Why does vector arithmetic reveal analogies?Slide28
Why does vector arithmetic reveal analogies?
We wish to find the closest
to
This is done with cosine similarity:
Problem
:
one similarity might dominate the rest.
Slide29
Why does vector arithmetic reveal analogies?
We wish to find the closest
to
Slide30
Why does vector arithmetic reveal analogies?
We wish to find the closest
to
This is done with cosine similarity:
Slide31
Why does vector arithmetic reveal analogies?
We wish to find the closest
to
This is done with cosine similarity:
Slide32
Why does vector arithmetic reveal analogies?
We wish to find the closest
to
This is done with cosine similarity:
vector arithmetic
similarity
arithmetic
Slide33
Why does vector arithmetic reveal analogies?
We wish to find the closest
to
This is done with cosine similarity:
vector arithmetic
similarity
arithmetic
Slide34
Why does vector arithmetic reveal analogies?
We wish to find the closest to
This is done with cosine similarity:
vector arithmetic
similarity
arithmetic
Slide35
Why does vector arithmetic reveal analogies?
We wish to find the closest to
This is done with cosine similarity:
vector arithmetic
similarity
arithmetic
royal?
female?Slide36
What does each similarity term mean?Observe the joint features with explicit representations!
uncrowned
Elizabeth
majesty
Katherine
second
impregnate
…
…
uncrowned
Elizabeth
majesty
Katherine
second
impregnate
…
…Slide37
Can we do better?Slide38
Let’s look at some mistakes…Slide39
Let’s look at some mistakes…
England
London
Baghdad
?
Slide40
Let’s look at some mistakes…
England
London
Baghdad
Iraq
Slide41
Let’s look at some mistakes…
England
London
Baghdad
Mosul?
Slide42
The Additive Objective
Slide43
The Additive Objective
Slide44
The Additive Objective
Slide45
The Additive Objective
Slide46
The Additive Objective
Slide47
The Additive Objective
Problem
:
one similarity might dominate the
rest
Much more prevalent in
explicit
representation
Might explain why explicit underperformedSlide48
How can we do better?Slide49
How can we do better?Instead of
adding similarities, multiply them!Slide50
How can we do better?
Instead of adding similarities, multiply them!
Slide51
How can we do better?
Instead of adding similarities, multiply them!
Slide52
Embedding vs Explicit (Round 2)Slide53
Multiplication > AdditionSlide54
Explicit is on-par with EmbeddingSlide55
Explicit is on-par with Embedding
Embeddings are not “magical”Embedding-based similarities have a more uniform distributionThe additive objective performs better on smoother distributionsThe multiplicative objective overcomes this issueSlide56
ConclusionAre analogies unique to
neural embeddings?No! They occur in sparse and explicit representations as well.Why does vector arithmetic reveal analogies?Because
vector arithmetic
is equivalent to
similarity arithmetic
.
Can we do better?
Yes! The multiplicative objective is significantly better.Slide57
More Results and Analyses (in the paper)Evaluation on closed-vocabulary analogy questions (
SemEval 2012)Experiments with a third objective function (PairDirection)Do different representations reveal the same analogies?Error analysis
A feature-level interpretation of how word similarity reveals analogiesSlide58
Thanks
for
listening
)
Slide59
Agreement
Objective
Both
Correct
Both
Wrong
Embedding
CorrectExplicitCorrectMSR43.97%28.06%15.12%12.85%Google
57.12%
22.17%
9.59%
11.12%Slide60Slide61
Error Analysis: Default BehaviorA certain word acts as a “prototype” answer for its semantic type
Examples:daughter for feminine answersFresno for US citiesIllinois for US statesTheir vectors are the centroid of that semantic typeSlide62
Error Analysis: Verb InflectionsIn verb analogies:
walked is to walking as danced is to… ?The correct lemma is often found ( dance )
But with the wrong inflection (
dances
)
Probably an artifact of the window contextSlide63
The Iraqi ExampleSlide64
The Iraqi ExampleSlide65
The Additive Objective
Problem
:
one similarity might dominate the
rest
Much more prevalent in
explicit
representationMight explain why explicit underperformed
Slide66
The Iraqi Example (Revisited)