12月7日 研究会 祭都援炉 マットエンロ Up until now Getting to know NLP Speech and Language Processing Jurafsky amp Martin 論文 OnDemand Information Extract ID: 388412
Download Presentation The PPT/PDF document "Generalizing semantic relations" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Generalizing semantic relations
12月7日
研究会
祭都援炉
(
マットエンロ
)Slide2
Up until now: Getting to know NLP
“Speech and Language Processing” (
Jurafsky
& Martin)
論文:
On-Demand Information Extract
(
Sekine
)
Learning First-Order Horn Clauses From Web Text
[Sherlock] (
Schoenmackers
2010)
Coupled Semi-Supervised Learning for Information Extraction
[NELL] (Carlson)
Identifying Relations for Open Information Extraction
[
ReVerb
] (Fader)
Relation Acquisition using Word Classes and Partial Patterns
(
Saeger
)
Interpretation as Abduction
(Hobbs)
An ILP Formulation of
Abudctive
Inference for Discourse Interpretation
(Inoue
氏
)
Learning Dependency-Based Compositional Semantics
(Liang)Slide3
Motivation
Ultimate Goal: Inference
Inference requires: knowledge
Large scale database of semantic relations have been created from web textSlide4
ReVerb (Fader et al., 2011)
relation
(
arg1
,
arg2
) tuples acquired from large-scale Web data
Over 14.5 million semantic relations released to publicSlide5
5
ReVerb
- Predicate
Detection
POS tag patterns: V W* P
Milk
is a rich source of
calcium which
is critical for
building strong bones
.
ARG1
RELATION ARG2
(
is
a rich source
of
)
(
is
critical
for
)
Slide stolen from Eric Nichol’s
kenkyuukai
2011-11-02Slide6
6
ReVerb
- Argument
Detection
Noun Phrase chunking
Milk
is a rich source of
calcium
which
is critical for
building
strong bones
.
ARG1
RELATION
ARG2
(
Milk
is
a rich source of
calcium
)
(
calcium
is
critical for
strong bones
)
Slide
modified
from Eric Nichol’s
kenkyuukai
2011-11-02Slide7
Examples (from Fader 2011)Slide8
Problem
Many different ways to express equivalent meaning
Consider
resides
relation
Table shows counts of reverb relations containing
live
or
reside
Relations in red
should be generalized to:
reside(<PERSON>,<PLACE>)
We aim to generalize through
semantic clustering
Frequency | Relation
27,383
lives in
10,315
live in
8,653 lived in
5,185
currently resides in
4,002
currently lives in
3,310
now lives in
1,933
resides in
1,548
is a resident of
1,468 live on
1,308
now resides in
1,191 has lived in
1,055 resided in
876
lives on
590 lived on
531
live at
515
still lives in
461 can live up to
456
is a lifelong resident of
444 was a resident of
413 live for
382 must be residents of
332 lives with
332 lived forSlide9
Semantic Clustering
Store
generalizations
in the form
<
Rel
-type
>
(
<
Arg1 Type
>
,
<
Arg2 Type
>
)
Within a
semantic relation generalization dictionarySlide10
Semantic Clustering Goals
Semantic Relations Dictionary
Mapping from
ReVerb’s
specific instances to generalized semantic-placeholder that looks like
<Generalized-
Rel
>
(
<Arg1 Type>,<Arg2 Type>)
Method of mapping real-world relation instances to generalized semantic form
C
an
be accomplished with a semantic similarity function
Clustering and generalize relations
Looking up new relations from textSlide11
Semantic Similarity
Ontological: Similarity based on arguments’ hierarchy of semantic types
Lexical: Similarity based on lexical features of relation
Contextual: Similarity based on surrounding textSlide12
Ontological Similarity
Determine type of arg1 and arg2
WordNet
synset
Sherlock semantic class (
Schoenmackers
2010)Use WordNet similarity functions on argument typeSlide13
Ontological Similarity (Clustering)
Matt
resides in
Sendai
Eric
lives in
Japan
Should these be clustered together? (Yes!)
Matching arg1 type
<Person>
Matching arg2 type
<Place>
High ontological similarity means good chance of clusteringSlide14
Ontological Similarity (Lookup)
Matt
lives on
a
farm
=> ??
?
Eric
lives on
donuts
=> ??
?
Are these the same semantic relation? (NO!)
Multiple entries in dictionary for
lives_on
:
r
esides
(<Living Thing>,<Place>)
nourished_by
(<
Living
Thing>,<
Nourishment
>)
Use argument type similarity testing to differentiate between senses of
lives_onSlide15
Ontological Similarity (Lookup)
Matt
lives on
a
farm
=>
resides
(
Matt,a_farm
)
Eric
lives on
donuts
=> ???
Are these the same semantic relation? (NO!)
Multiple entries in dictionary for
lives_on
:
r
esides
(<Living Thing>,<Place>)
nourished_by
(<
Living
Thing>,<
Nourishment
>)
Use argument type similarity testing to differentiate between senses of
lives_onSlide16
Ontological Similarity (Lookup)
Matt
lives on
a
farm =>
resides
(
Matt,a_farm
)
Eric
lives on
donuts =>
nourished_by
(
Eric,donuts
)
Are these the same semantic relation? (NO!)
Multiple entries in dictionary for
lives_on
:
r
esides
(<Living Thing>,<Place>)
nourished_by
(<
Living
Thing>,<
Nourishment
>)
Use argument type similarity testing to differentiate between senses of
lives_onSlide17
Ontological Similarity (Lookup cont.)
Which version will ontological similarity suggest we return for each example?
Matt
lives on
a
farm
<
Person
>
lives on<?>
<
Place
>
resides
(
Matt,a_farm
)
Eric
lives on
donuts
<Person>
lives on<?>
<Food>
nourished_by
(
Eric,donuts
)
o
nto_sim
(
<Food>
,
<Nourishment>
) is greater than
onto_sim
(
<Food
>
,
<Place>
)
so we know knows
Eric
is
nourished_by
donutsSlide18
Lexical Similarity
Use relationship features to score similarity
N-gram overlap, bag-of-words, …
Weighting content/functional words differently
etcSlide19
Lexical Similarity
Correctly groups together
Lives at
Live in
But erroneously clusters
Lives for
Lives
with
And doesn’t cluster
resides in
(relying on ontological
sim
.
f
or that)
27,383 lives in
10,315 live in
8,653 lived in
5,185 currently resides in
4,002 currently lives in
3,310 now lives in
1,933 resides in
1,548 is a resident of
1,468 live on
1,308 now resides in
1,191 has lived in
1,055 resided in
876 lives on
590 lived on
531 live at
515 still lives in
461 can live up to
456 is a lifelong resident of
444 was a resident of
413 live for
382 must be residents of
332 lives with
332 lived forSlide20
Contextual Similarity
How similar is the surrounding text?
To answer this, we need original text
Will have to hunt down sentences on the web
Time consuming
Feasible?Slide21
Issues - Clustering
Huge data-set
O(n^2) clustering algorithms are infeasible
Investigating efficient methods:
Hierarchical Clustering
Co-Clustering (
Dhillon
et al., 2003)Probablistic Latent Semantic Indexing
Location Sensitive
HashingSlide22
Other Issues
Word tense
Does
lived in
belong with
lives in
?
Detection of conflicting polarity
(
Acesulfame_Potassium
does_not_promote
tooth_decay
)
(Conservatives
should_not_promote
democracy)
(Website
must_not_promote
hate)
?
(Environmentalists
are_not_alone_in_promoting
renewable_energy
)
Semantic type coverage problems
Use lexical similarity-based lookup for semantic type too?Slide23
progress
Up to now
Looking aheadSlide24
進捗報告
Set up
git
repository
Implemented:
Wrapper for reverb
(data lookup)
WordNet type-lookupSherlock type-lookup
Ontological similarity
Made a slideshow for
研究会
ただ今ご覧になっていただいている物Slide25
計画!!!!!
Finish similarity score
Selecting a
wordnet
ontological similarity function
(Over 5 different evaluations already exist)
Implement lexical similarity(Should already be in NLTK somewhere)Implementing contextual similarity
(Prepare for the hunt!)
Selecting & implementing a clustering method
Test on
ReVerb
data
First on
wikipedia
…
Then on
clueweb