/
Word sense disambiguation (1) Word sense disambiguation (1)

Word sense disambiguation (1) - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
469 views
Uploaded On 2017-07-30

Word sense disambiguation (1) - PPT Presentation

Instructor Paul Tarau based on Rada Mihalceas original slides Note Some of the material in this slide set was adapted from a tutorial given by Rada Mihalcea amp Ted Pedersen at ACL 2005 ID: 574430

word sense words plant sense word plant words context semantic wsd lesk meaning similarity selectional based pine class preferences senses lexical knowledge

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Word sense disambiguation (1)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Word sense disambiguation (1)

Instructor: Paul Tarau, based on

Rada

Mihalcea’s

original slides

Note

: Some of the material in this slide set was adapted from a tutorial given by

Rada

Mihalcea

& Ted Pedersen at ACL 2005Slide2

Definitions

Word sense disambiguation

is the problem of selecting a sense for a word from a set of predefined possibilities.

Sense Inventory usually comes from a dictionary or thesaurus.

Knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches

Word sense discrimination

is the problem of dividing the usages of a word into different meanings, without regard to any particular existing sense inventory.

Unsupervised techniques Slide3

Computers versus Humans

Polysemy

– most words have many possible meanings.

A computer program has no basis for knowing which one is appropriate, even if it is obvious to a human…

Ambiguity is rarely a problem for humans in their day to day communication, except in extreme cases…Slide4

Ambiguity for Humans - Newspaper Headlines!

DRUNK GETS NINE YEARS IN VIOLIN CASE

FARMER BILL DIES IN HOUSE

PROSTITUTES APPEAL TO POPE

STOLEN PAINTING FOUND BY TREE

RED TAPE HOLDS UP NEW BRIDGE

DEER KILL 300,000

RESIDENTS CAN DROP OFF TREES

INCLUDE CHILDREN WHEN BAKING COOKIES

MINERS REFUSE TO WORK AFTER DEATH Slide5

Ambiguity for a Computer

The fisherman jumped off the

bank

and into the water.

The

bank

down the street was robbed!

Back in the day, we had an entire

bank

of computers devoted to this problem.

The

bank

in that road is entirely too steep and is really dangerous.

The plane took a

bank

to the left, and then headed off towards the mountains. Slide6

Early Days of WSD

Noted as problem for Machine Translation (Weaver, 1949)

A word can often only be translated if you know the specific sense intended (A bill in English could be a pico or a cuenta in Spanish)

Bar-Hillel (1960) posed the following:

Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy.

Is

pen

a writing instrument or an enclosure where children play?

…declared it unsolvable, left the field of MT!

Slide7

Since then…

1970s - 1980s

Rule based systems

Rely on hand crafted knowledge sources

1990s

Corpus based approaches

Dependence on sense tagged text

(Ide and Veronis, 1998) overview history from early days to 1998.

2000s

Hybrid Systems

Minimizing or eliminating use of sense tagged text

Taking advantage of the WebSlide8

Practical Applications

Machine Translation

Translate

bill

from English to Spanish

Is it a

pico

or a

cuenta

?

Is it a bird jaw or an invoice?

Information Retrieval

Find all Web Pages about

cricket

The sport or the insect?

Question Answering

What is George Miller

s position on gun control?

The psychologist or US congressman?

Knowledge Acquisition

Add to KB: Herb Bergson is the mayor of Duluth.

Minnesota or Georgia?Slide9

Knowledge-based WSD

Task definition

Knowledge-based WSD

= class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text

Resources

Yes

Machine Readable Dictionaries

Raw corpora

No

Manually annotated corpora

Scope

All open-class wordsSlide10

Machine Readable Dictionaries

In recent years, most dictionaries made available in Machine Readable format (MRD)

Oxford English Dictionary

Collins

Longman Dictionary of Ordinary Contemporary English (LDOCE)

Thesauruses – add synonymy information

Roget Thesaurus

Semantic networks – add more semantic relations

WordNet

EuroWordNetSlide11

MRD – A Resource for Knowledge-based WSD

For each word in the language vocabulary, an MRD provides:

A list of meanings

Definitions (for all word meanings)

Typical usage examples (for most word meanings)

WordNet definitions/examples for the noun

plant

buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles“

a living organism lacking the power of locomotion

something planted secretly for discovery by another; "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant"

an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audienceSlide12

MRD – A Resource for Knowledge-based WSD

A thesaurus adds:

An explicit synonymy relation between word meanings

A semantic network adds:

Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, entailnment, etc.

WordNet synsets for the noun “plant”

1. plant, works, industrial plant

2. plant, flora, plant life

WordNet related concepts for the meaning “plant life”

{plant, flora, plant life}

hypernym: {organism, being}

hypomym: {house plant}, {fungus}, …

meronym: {plant tissue}, {plant part}

holonym: {Plantae, kingdom Plantae, plant kingdom} Slide13

Lesk Algorithm

(Michael Lesk 1986): Identify senses of words in context using definition overlap

Algorithm:

Retrieve from MRD all sense definitions of the words to be disambiguated

Determine the definition overlap for all possible sense combinations

Choose senses that lead to highest overlap

Example: disambiguate PINE CONE

PINE

1. kinds of evergreen tree with needle-shaped leaves

2. waste away through sorrow or illness

CONE

1. solid body which narrows to a point

2. something of this shape whether solid or hollow

3. fruit of certain evergreen trees

Pine#1

 Cone#1 = 0

Pine#2

 Cone#1 = 0

Pine#1

 Cone#2 = 1

Pine#2

 Cone#2 = 0

Pine#1

 Cone#3 = 2

Pine#2

 Cone#3 = 0Slide14

Lesk Algorithm for More than Two Words?

I saw a man who is 98 years old and can still walk and tell jokes

nine open class words:

see

(26),

man

(11),

year

(4),

old

(8),

can

(5),

still

(4),

walk

(10),

tell

(8),

joke

(3)

43,929,600 sense combinations! How to find the optimal sense combination?

Simulated annealing (Cowie, Guthrie, Guthrie 1992)Define a function E = combination of word senses in a given text.Find the combination of senses that leads to highest definition overlap (redundancy) 1. Start with E = the most frequent sense for each word 2. At each iteration, replace the sense of a random word in the set with a different sense, and measure E 3. Stop iterating when there is no change in the configuration of sensesSlide15

Lesk Algorithm: A Simplified Version

Original Lesk definition: measure overlap between sense definitions for all words in context

Identify simultaneously the correct senses for all words in context

Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and current context

Identify the correct sense for one word at a time

Search space significantly reducedSlide16

Lesk Algorithm: A Simplified Version

Example: disambiguate PINE in

“Pine cones hanging in a tree”

PINE

1. kinds of evergreen tree with needle-shaped leaves

2. waste away through sorrow or illness

Pine#1

 Sentence = 1

Pine#2

 Sentence = 0

Algorithm

for simplified Lesk:

Retrieve from MRD all sense definitions of the word to be disambiguated

Determine the overlap between each sense definition and the current context

Choose the sense that leads to highest overlapSlide17

Evaluations of Lesk Algorithm

Initial evaluation by M. Lesk

50-70% on short samples of text manually annotated set, with respect to Oxford Advanced Learner’s Dictionary

Simulated annealing

47% on 50 manually annotated sentences

Evaluation on Senseval-2 all-words data, with back-off to random sense

(Mihalcea & Tarau 2004)

Original Lesk: 35%

Simplified Lesk: 47%

Evaluation on Senseval-2 all-words data, with back-off to most frequent sense

(Vasilescu, Langlais, Lapalme 2004)

Original Lesk: 42%

Simplified Lesk: 58% Slide18

Selectional Preferences

A way to constrain the possible meanings of words in a given context

E.g. “

Wash a dish

” vs. “

Cook a dish

WASH-OBJECT vs. COOK-FOOD

Capture information about possible relations between semantic classes

Common sense knowledge

Alternative terminology

Selectional Restrictions

Selectional Preferences

Selectional ConstraintsSlide19

Acquiring Selectional Preferences

From annotated corpora

Circular relationship with the WSD problem

Need WSD to build the annotated corpus

Need selectional preferences to derive WSD

From raw corpora

Frequency counts

Information theory measures

Class-to-class relationsSlide20

Preliminaries: Learning Word-to-Word Relations

An indication of the

semantic fit between two words

1. Frequency counts

Pairs of words connected by a syntactic relations

2. Conditional probabilities

Condition on one of the wordsSlide21

Learning Selectional Preferences (1)

Word-to-class relations (Resnik 1993)

Quantify the contribution of a semantic class using all the concepts subsumed by that class

whereSlide22

Learning Selectional Preferences (2)

Determine the contribution of a word sense based on the assumption of equal sense distributions:

e.g. “plant” has two senses

50% occurrences are sense 1, 50% are sense 2

Example: learning restrictions for the verb “

to drink

Find high-scoring verb-object pairs

Find “prototypical” object classes (high association score)Slide23

Using Selectional Preferences for WSD

Algorithm:

1. Learn a large set of selectional preferences for a given syntactic relation R

2. Given a pair of words W

1

– W

2

connected by a relation R

3. Find all selectional preferences W

1

– C (word-to-class) or C

1

– C

2

(class-to-class) that apply

4. Select the meanings of W

1

and W

2

based on the selected semantic class

Example: disambiguate

coffee

in “drink coffee”1. (beverage) a beverage consisting of an infusion of ground coffee beans2. (tree) any of several small trees native to the tropical Old World3. (color) a medium to dark brown color Given the selectional preference “DRINK BEVERAGE” : coffee#1Slide24

Evaluation of Selectional Preferences for WSD

Data set

mainly on verb-object, subject-verb relations extracted from SemCor

Compare against random baseline

Results (Agirre and Martinez, 2000)

Average results on 8 nouns

Similar figures reported in (Resnik 1997)Slide25

Semantic Similarity

Words in a discourse must be related in meaning, for the discourse to be coherent (Haliday and Hassan, 1976)

Use this property for WSD – Identify related meanings for words that share a common context

Context span:

1. Local context: semantic similarity between pairs of words

2. Global context: lexical chainsSlide26

Semantic Similarity in a Local Context

Similarity determined between pairs of concepts, or between a word and its surrounding context

Relies on similarity metrics on semantic networks

(Rada et al. 1989)

carnivore

wild dog

wolf

bear

feline, felid

canine, canid

fissiped mamal, fissiped

dachshund

hunting dog

hyena dog

dingo

hyena

dog

terrierSlide27

Semantic Similarity Metrics for WSD

Disambiguate target words based on similarity with one word to the left and one word to the right

(Patwardhan, Banerjee, Pedersen 2002)

Evaluation:

1,723 ambiguous nouns from Senseval-2

Among 5 similarity metrics, (Jiang and Conrath 1997) provide the best precision (39%)

Example: disambiguate PLANT in

plant with flowers

PLANT

plant, works, industrial plant

plant, flora, plant life

Similarity (plant#1, flower) = 0.2

Similarity (plant#2, flower) = 1.5

: plant#2Slide28

Semantic Similarity in a Global Context

Lexical chains

(Hirst and St-Onge 1988), (Haliday and Hassan 1976)

A lexical chain is a

s

equence of semantically related words, which creates a context and contributes to the continuity of meaning and the coherence of a discourse

Algorithm

for finding lexical chains:

Select the candidate words from the text. These are words for which we can compute similarity measures, and therefore most of the time they have the same part of speech.

For each such candidate word, and for each meaning for this word, find a chain to receive the candidate word sense, based on a semantic relatedness measure between the concepts that are already in the chain, and the candidate word meaning.

If such a chain is found, insert the word in this chain; otherwise, create a new chain.Slide29

Semantic Similarity of a Global Context

A very long

train

traveling

along the

rails

with a constant

velocity

v in a

certain

direction

train

#1: public transport

#2: order set of things

#3: piece of cloth

travel

#1 change location

#2: undergo transportation

rail

#1: a barrier

# 2: a bar of steel for trains

#3: a small birdSlide30

Lexical Chains for WSD

Identify lexical chains in a text

Usually target one part of speech at a time

Identify the meaning of words based on their membership to a lexical chain

Evaluation:

(Galley and McKeown 2003) lexical chains on 74 SemCor texts give 62.09%

(Mihalcea and Moldovan 2000) on five SemCor texts give 90% with 60% recall

lexical chains

anchored

on monosemous words

(Okumura and Honda 1994) lexical chains on five Japanese texts give 63.4% Slide31

Example:

plant/flora

is used more often than

plant/factory

-

annotate any instance of

PLANT

as

plant/flora

Heuristics: Most Frequent Sense

Identify the most often used meaning and use this meaning by default

Word meanings exhibit a Zipfian distribution

E.g. distribution of word senses in SemCorSlide32

E.g. The ambiguous word

PLANT

occurs 10 times in a discourse

all instances of

plant

carry the same meaning

Heuristics: One Sense Per Discourse

A word tends to preserve its meaning across all its occurrences in a given discourse

(Gale, Church, Yarowksy 1992)

What does this mean?

Evaluation:

8 words with two-way ambiguity, e.g.

plant

,

crane

, etc.

98% of the two-word occurrences in the same discourse carry the same meaning

The grain of salt: Performance depends on granularity

(Krovetz 1998) experiments with words with more than two senses

Performance of “one sense per discourse” measured on SemCor is approx. 70%Slide33

The ambiguous word

PLANT

preserves its meaning in all its

occurrences within the collocation

industrial plant

,

regardless

of the context where this collocation occurs

Heuristics: One Sense per Collocation

A word tends to

preserve

its meaning when used in the same collocation

(

Yarowsky

1993)

Strong for adjacent collocations

Weaker as the distance between words increases

An example

Evaluation:

97% precision on words with two-way ambiguity

Finer granularity:(Martinez and Agirre 2000) tested the “one sense per collocation” hypothesis on text annotated with WordNet senses 70% precision on SemCor words