via Question Answering Omer Levy Ido Dagan Bar Ilan University Israel Relation Inference When Is a given naturallanguage relation implied by another X cures Y X treats Y ID: 935947
Download Presentation The PPT/PDF document "Annotating Relation Inference in Context" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Annotating Relation Inference in Contextvia Question Answering
Omer Levy Ido DaganBar-Ilan UniversityIsrael
Slide2Relation Inference
When Is a given natural-language relation implied by another?X cures Y
X
treats
YQuestion:Which drug treats headaches?Text:Aspirin cures headaches.
Relation Inference in Context
When is a given natural-language relation implied by another?X
eliminates
Y X treats YQuestion:Which drug
treats
headaches?Text:Aspirin eliminates headaches.
Relation Inference in Context
When is a given natural-language relation implied by another?X
eliminates
Y X treats YQuestion:
Which drug
treats
patients?Text:Aspirin eliminates patients.
Prior ArtDIRT (Lin and
Pantel, 2001) Universal Schema (Rocktäschel
et al., NAACL 2015)
PPDB 2.0 (
Pavlick et al., ACL 2015)RELLY (Grycner et al., EMNLP 2015)
Slide6This TalkNot about relation inference algorithms
How to evaluate relation inference algorithmsProblem: current evaluations are biased and
can’t measure recall
Contributions:
Novel methodology for creating unbiased and natural datasetsA new benchmark for relation inference in context
Slide7Evaluating Relation Inference in Context
Slide8Extrinsic Evaluation
Usage: plug the inference algorithm into an RTE systemProblems:Mixes other semantic phenomenaLess relation inference examplesHard to trace/
analyze
System
selection introduces biasWe want an intrinsic evaluation too
Slide9X
eliminates Y
X
treats Yaspirin eliminates headaches aspirin treats headaches
Post-hoc Evaluation
Usage: apply relation inference, annotate inferred facts
Learn inference rules
Apply rules to text
Slide10X
eliminates Y
X
treats Yaspirin eliminates headaches aspirin treats headaches
Post-hoc Evaluation
Usage:
apply relation inference, annotate inferred facts
Learn inference rules
Apply rules to text
Annotate for entailment
✓
Slide11Post-hoc Evaluation
Usage: apply relation inference, annotate inferred factsProblems:Expensive research cycleDifficult to replicateOblivious to recallWe want a
pre-annotated
dataset
Slide12aspirin
eliminates
headaches
aspirin
treats headaches Pre-annotated Dataset (Zeichner et al, 2012)
Usage:
compare relation inference to
recorded post-hoc evaluationLearn inference rules
using DIRT
Apply rules to
text
Open IE tuples
Annotate for entailment
✓
Slide13run onceaspirin
eliminates
headaches
aspirin treats headaches Pre-annotated Dataset (Zeichner
et al, 2012)
Usage:
compare relation inference to recorded post-hoc evaluationLearn inference rules
using DIRT
Apply rules to
text
Open IE tuples
Annotate for entailment
✓
Slide14run once Premise Hypothesis Label
Algo
aspirin
eliminates headaches aspirin treats headaches
Pre-annotated Dataset (
Zeichner et al, 2012)Usage: compare relation inference to
recorded
post-hoc evaluation
Learn inference rules
using DIRT
Apply rules to
text
Open IE tuples
Annotate for entailment
Compare new algorithms’ predictions to annotated data
✓
✗
Slide15run once Premise Hypothesis Label
Algo
aspirin
eliminates headaches aspirin treats headaches
Pre-annotated Dataset (
Zeichner
et al, 2012)Usage: compare relation inference to recorded post-hoc evaluation
Learn inference rules
using DIRT
Apply rules to
text
Open IE tuples
Annotate for entailment
Compare new algorithms’ predictions to annotated data
✓
✗
Slide16Pre-annotated Dataset (Zeichner et al, 2012)
Usage: compare relation inference to recorded post-hoc evaluationProblems:
Expensive research cycle
Difficult to replicate
Oblivious to recallBiased towards DIRT
Slide17How can we do better?
Slide18Desired Qualities of Evaluation SchemeIntrinsic task
Pre-annotated datasetSensitive to recallNot biased towards a particular methodCrowdsourcableHigh quality labels
Slide19Reformulating Relation Inference as
Question Answering
Slide20Data Collection
Questions Which ingredient is included in chocolate?
Candidate Answers
chocolate is made from the cocoa bean
Premise
Hypothesis
Slide21Data Collection: Questions
Existing QA datasetsTREC (Voorhees and Tice, 2000)WikiAnswers (Fader et al., 2013)WebQuestions (Berant et al., 2013)
Manually converted to “Which
?”
Who
climbed
the Everest
?
Which
person
climbed
the Everest
?
Data Collection: Questions
Existing QA datasetsTREC (Voorhees and Tice, 2000)WikiAnswers (Fader et al., 2013)WebQuestions (Berant et al., 2013)
Manually converted to “Which
?”
Who
climbed
the Everest
?
Which
person
climbed
the Everest
?
key idea: naturally-occurring questions
Slide23Data Collection: Candidate Answers
Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from the cocoa beanGiven a question “
Which
?”, fetch all assertions where:
Data Collection: Candidate Answers
Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from the cocoa bean
Given a question “
Which
?”, fetch all assertions where:
One of the arguments is equal to
(
)
Data Collection: Candidate Answers
Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from the cocoa
bean
Given a question “
Which
?”, fetch all assertions where:
One of the arguments is equal to
(
)
The other argument is a type of
(
is a
)
Data Collection: Candidate Answers
Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from
the cocoa
bean
Given a question “Which
?”, fetch all assertions where:
One of the arguments is equal to
(
)
The other argument is a type of
(
is a
)
The relation is different from
(
)
Data Collection: Candidate Answers
Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from
the cocoa
bean
Given a question “Which
?”, fetch all assertions where:
One of the arguments is equal to
(
)
The other argument is a type of
(
is a
)
The relation is different from
(
)
key idea: unbiased sample of relations
Slide28Crowdsourced Annotation
Given 1 question + 20 matching candidate answersAnnotate each candidate answer as either: ✓ The sentence answers the question. ✗
The sentence does not answer the question.
? The sentence does not make sense, or is severely non-grammatical.
Slide29Crowdsourced Annotation: Masking AnswersThe annotators are biased by their own world knowledge
Q: Which country borders Ethiopia?
A:
Eritrea
invaded Ethiopia
Slide30Crowdsourced Annotation: Masking Answers
The annotators are biased by their own world knowledge Q: Which country borders
Ethiopia
?
A: Eritrea invaded Ethiopia Eritrea borders Ethiopia
✓
Slide31Crowdsourced Annotation: Masking Answers
The annotators are biased by their own world knowledge Q: Which country borders
Ethiopia
?
A: Eritrea invaded Ethiopia Eritrea borders Ethiopia
✓
A: Italy invaded Ethiopia
Crowdsourced Annotation: Masking Answers
The annotators are biased by their own world knowledge Q: Which country borders
Ethiopia
?
A: Eritrea invaded Ethiopia Eritrea borders Ethiopia
✓
A: Italy invaded Ethiopia
Italy
borders
Ethiopia
✗
Crowdsourced Annotation: Masking Answers
The annotators are biased by their own world knowledge Q: Which country borders
Ethiopia
?
A: Eritrea invaded Ethiopia Eritrea borders Ethiopia
✓
A: Italy invaded Ethiopia
Italy
borders
Ethiopia
✗
Filter world knowledge from annotation by
masking
the answer
Crowdsourced Annotation: Masking Answers
The annotators are biased by their own world knowledge Q: Which country borders
Ethiopia
?
A: [COUNTRY] invaded EthiopiaFilter world knowledge from annotation by masking the answerSubstitute
with
Crowdsourced Annotation
Slide36Crowdsourced Annotation
No context switch
faster annotation!
Crowdsourced Annotation: Aggregation
5 Mechanical Turk annotators per question-answer pairAt least 4/5 must agree on labelDiscard nonsensical/non-grammatical (
?
) examples
Labeled examples (after filtering): 16,371Agreement with expert: 90%
Desired Qualities of Evaluation SchemeIntrinsic task
Pre-annotated dataset Sensitive to recall Not biased towards a particular method
Crowdsourcable
High quality labels
Slide39Desired Qualities of Evaluation Scheme
Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recallNot biased towards a particular methodCrowdsourcable
High quality labels
(by definition)
Slide40Desired Qualities of Evaluation Scheme
Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recall ✓Not biased towards a particular method
✓
CrowdsourcableHigh quality labels(by definition)
(
)
Desired Qualities of Evaluation Scheme
Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recall ✓Not biased towards a particular method
✓
Crowdsourcable ✓ (16K examples)High quality labels(by definition)
(
)
Desired Qualities of Evaluation Scheme
Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recall ✓
Not biased towards a particular method
✓Crowdsourcable ✓ (16K examples)High quality labels ✓ (
vs expert)
(by definition)
(
)
How many examples does post-hoc miss?
Slide44Slide45Conclusions
Slide46Recap
Novel methodology for creating unbiased and natural datasetsKey idea: Reformulate
relation inference
as
question answeringA new benchmark for relation inference in contextEmpirical finding: current methods have very low coverage
Slide47Going Forward
Data is publicly available! bit.ly/2aCLuLB Poses a new challengeCode is publicly available!
bit.ly/2b05EhK
Extend our methodology Larger datasets for supervised learning
Thank
you!