Erk You can get an idea of what a word means from observing it in context He filled the wampimuk passed it around and we all drank some We found a little hairy wampimuk sleeping behind a tree ID: 930851
Download Presentation The PPT/PDF document "Distributional models Katrin" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Distributional models
Katrin
Erk
Slide2You can get an idea of what a word means from observing it in context
He filled the
wampimuk
, passed it around, and we all drank someWe found a little hairy wampimuk sleeping behind a tree. (examples by Marco Baroni)Distributional modeling:Represent the meaning of a word through the contexts in which it is observedSimilar words appear in similar contexts
2
Slide3What words can appear in these contexts?
Word 1
: drown, bathroom, shower, fill, fall, lie, electrocute, toilet, whirlpool, iron, gin
Word 2:
eat, fall, pick, slice, peel, tree, throw, fruit, pie, bite, crab, grate
Word 3
: advocate, overthrow, establish, citizen, ideal, representative, dictatorship, campaign, bastion, freedom
Word 4
: spend, enjoy, remember, last, pass, end, die, happen, brighten, relive
Slide4What words can appear in these contexts?
Word 1
: drown, bathroom, shower, fill, fall, lie, electrocute, toilet, whirlpool, iron, gin
Word 2:
eat, fall, pick, slice, peel, tree, throw, fruit, pie, bite, crab, grate
Word 3
: advocate, overthrow, establish, citizen, ideal, representative, dictatorship, campaign, bastion, freedom
Word 4
: spend, enjoy, remember, last, pass, end, die, happen, brighten, relive
bathtub
apple
democracy
day
Slide5What can you say about word number 5?
Word 1
: drown, bathroom, shower, fill, fall, lie, electrocute, toilet, whirlpool, iron, gin
Word 2:
eat, fall, ripe, slice, peel, tree, throw, fruit, pie, bite, crab, grate
Word 3
: advocate, overthrow, establish, citizen, ideal, representative, dictatorship, campaign, bastion, freedom
Word 4
: spend, enjoy, remember, last, pass, end, die, happen, brighten, relive
bathtub
apple
democracy
day
Word 5
: eat, paint, peel, apple, fruit, juice, lemon, blue, grow
Slide6What can you say about word number 5?
Word 1
: drown, bathroom, shower, fill, fall, lie, electrocute, toilet, whirlpool, iron, gin
Word 2:
eat, fall, ripe, slice, peel, tree, throw, fruit, pie, bite, crab, grate
Word 3
: advocate, overthrow, establish, citizen, ideal, representative, dictatorship, campaign, bastion, freedom
Word 4
: spend, enjoy, remember, last, pass, end, die, happen, brighten, relive
bathtub
apple
democracy
day
Word 5
:
eat
, paint, peel, apple, fruit,
juice, lemon
, blue, grow
orange
Slide7Describing meaning through contextSimilar words appear in similar contexts
apple, orange
Measure similarity in meaning as similarity in contexts
Caveat: If a word has multiple meanings, like “orange”, it will appear in a mixture of contextsHow to describe the contexts of a word?Count other words nearby
Slide8Counting context
words for “apple”
They
picked up red apples that had fallen to the groundEating apples is healthyWord count, 3-word context window, lemmatized
She
ate
a red applePick an apple.
abeeatfallhavehealthypickredthatup2121112
211
Slide9How can we compare two context word counts?
eat
fall
ripeslicepeel
tree
throw
fruitpiebitecrab794244
4722120816014515610910488Count how often “apple” occurs close to other words in a large text collection (corpus):Interpret counts as coordinates:
falleatapple
Every context word
becomes a dimension.
Slide10How can we compare two context word counts?
eat
fall
ripeslicepeeltree
throw
fruit
piebitecrab79424447
22120816014515610910488Count how often “apple” occurs close to other words in a large text collection (corpus):Do the same for “orange”:
eatfallripeslicepeeltreethrowfruitpiebitecrab2652225622206474
111
4
4
8
Slide11How can we compare two context word counts?
eat
fall
ripeslicepeeltree
throw
fruit
piebitecrab79424447
22120816014515610910488Then visualize both count tables as vectors in the same space:eatfallripe
slicepeeltreethrowfruitpiebitecrab265222562220647411144
8
fall
eat
apple
orange
Similarity between
two words as
proximity in space
Slide12How can we compare two context word counts?
eat
fall
ripeslicepeeltree
throw
fruit
piebitecrab79424447
22120816014515610910488Then visualize both count tables as vectors in the same space:eatfallripe
slicepeeltreethrowfruitpiebitecrab265222562220647411144
8
fall
eat
apple
orange
Similarity between
two words as
proximity in space
Slide13What do we mean by “similarity” of vectors?
Euclidean distance (a dissimilarity measure!):
orange
apple
Slide14Problem with Euclidean distance: very sensitive to word frequency!
Braeburn
apple
Slide15What do we mean by “similarity” of vectors?
Cosine similarity:
orange
apple
Use angle between vectors
instead of point distance
to get around word
frequency issues
Slide16Using distributional modelsPredicting word similarity (WordSim353):
doctor nurse 7.00
professor doctor 6.62
student professor 6.81smart student 4.62smart stupid 5.81Doing the TOEFL test:provisionsstipulationsinterrelationsjurisdictions
interpretations
Similar:
synonymsantonymstopically related words
Slide17Using distributional modelsPredicting
priming effects
free associations
Finding (near-)synonyms: automatically building a thesaurusRelated: use distributional similarity of documents (containing similar words) in Information Retrieval
Slide18Corpora in which to count words
Corpus = text collection
What works best for computing
a distributional model?“Moby Dick”2 years of the Wall Street Journal A collection of dating adsA collection of webpage texts
Slide19Corpora in which to count wordsNeed to be available electronically
Best possible match for today’s English language in general
Mixture of genres
Mixture of authorsSpoken and writtenLarger is better
Slide20Corpora in which to count (English) words Brown Corpus
1 million words
Balanced corpus, mixture of genres
British National Corpus100 million wordsBalanced corpus, mixture of genres, spoken and written
Slide21Corpora in which to count (English) words English
Gigaword
corpus
1 billion wordsShort news articlesWikipedia dump2 billion wordsUKWaC (UK web as corpus)2 billion wordsCollection of webpages ending in .uk
Slide22What can we do with our word counts?… blackboard
Slide23Background in philosophy of language: Wittgenstein, “meaning” as “use”
Man
kan
für eine große Klasse von Fällen der Benützung des Wortes ‘Bedeutung’ – wenn auch nicht für
alle
Fälle seiner Benützung – dieses Wort so erklären: Die Bedeutung eines Wortes ist sein Gebrauch in der Sprache
. -- Wittgenstein, Philosophical InvestigationsFor a large class of cases – though not for all – in which we employ the word ‘meaning’ it can be explained thus: the meaning of a word is its use in the language. (translation: Anscombe/Stokhof)
Slide24Background in linguistics: Harris and FirthZellig
Harris (1957): Classify linguistic units by observing the contexts they occur in
phonemes
morphemesphrase types: noun phrase, verb phrase…Not specifically about semanticsJohn Firth (1957): “collocations”identify senses of a word by looking at groups of contexts in which it appears“You shall know a word by the company it keeps”
Slide25Background in psychologyLandauer
/
Dumais
1998: “A solution to Plato’s problem”How come you know so many words?“A typical American seventh grader knows the meaning of 10-15 words today that she did not know yesterday. She must have acquired most of them as a result of reading because (a) the majority of English words are used only in print, (b) she already knew well almost all the words she would have encoun- tered in speech, and (c) she learned less than one word by direct instruction.”
Slide26Background in psychologyMany phenomena to do with word meaning can be simulated with distributional models
Word similarity ratings
(WordSim353):
doctor nurse 7.00professor doctor 6.62student professor 6.81smart student 4.62smart stupid 5.81How would you simulate this with a distributional model?
Slide27Background in psychologyMany phenomena to do with word meaning can be simulated with distributional models
Priming
Hearing one word makes you react faster to a related word
Hodgson data:election – votedove – peacevase – flowerNOT: election – peace How would you simulate that with a distributional model?
Slide28Background in psychologyAre our mental representations of words, of concepts distributional?
Do they represent the contexts in which a word has been encountered?
Does the meaning of a word depend on the contexts in which it is used?
Slide29Background in psychologyConcepts as distributional:
Yes
(
Landauer and others):How else would we learn so many words?Text as one of the main ways in which we encounter wordsNo (Barsalou and others):Perception is central to how we represent conceptsDistributional and perceptual (Andrews/Vigliocco/Vinson and others)Both types of information seems to be relevant
Simulations: Combining both makes for a
better model