/
Annotating Relation Inference in Context Annotating Relation Inference in Context

Annotating Relation Inference in Context - PowerPoint Presentation

CutiePatootie
CutiePatootie . @CutiePatootie
Follow
342 views
Uploaded On 2022-08-04

Annotating Relation Inference in Context - PPT Presentation

via Question Answering Omer Levy Ido Dagan Bar Ilan University Israel Relation Inference When Is a given naturallanguage relation implied by another X cures Y X treats Y ID: 935947

relation inference annotated ethiopia inference relation ethiopia annotated treats evaluation annotation headaches biased answers eliminates aspirin rules pre borders

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Annotating Relation Inference in Context" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Annotating Relation Inference in Contextvia Question Answering

Omer Levy Ido DaganBar-Ilan UniversityIsrael

Slide2

Relation Inference

When Is a given natural-language relation implied by another?X cures Y

X

treats

YQuestion:Which drug treats headaches?Text:Aspirin cures headaches.

 

Slide3

Relation Inference in Context

When is a given natural-language relation implied by another?X

eliminates

Y X treats YQuestion:Which drug

treats

headaches?Text:Aspirin eliminates headaches.

 

Slide4

Relation Inference in Context

When is a given natural-language relation implied by another?X

eliminates

Y X treats YQuestion:

Which drug

treats

patients?Text:Aspirin eliminates patients.

 

Slide5

Prior ArtDIRT (Lin and

Pantel, 2001) Universal Schema (Rocktäschel

et al., NAACL 2015)

PPDB 2.0 (

Pavlick et al., ACL 2015)RELLY (Grycner et al., EMNLP 2015)

Slide6

This TalkNot about relation inference algorithms

How to evaluate relation inference algorithmsProblem: current evaluations are biased and

can’t measure recall

Contributions:

Novel methodology for creating unbiased and natural datasetsA new benchmark for relation inference in context

Slide7

Evaluating Relation Inference in Context

Slide8

Extrinsic Evaluation

Usage: plug the inference algorithm into an RTE systemProblems:Mixes other semantic phenomenaLess relation inference examplesHard to trace/

analyze

System

selection introduces biasWe want an intrinsic evaluation too

Slide9

X

eliminates Y

X

treats Yaspirin eliminates headaches aspirin treats headaches

 

Post-hoc Evaluation

Usage: apply relation inference, annotate inferred facts

Learn inference rules

Apply rules to text

Slide10

X

eliminates Y

X

treats Yaspirin eliminates headaches aspirin treats headaches

 

Post-hoc Evaluation

Usage:

apply relation inference, annotate inferred facts

Learn inference rules

Apply rules to text

Annotate for entailment

Slide11

Post-hoc Evaluation

Usage: apply relation inference, annotate inferred factsProblems:Expensive research cycleDifficult to replicateOblivious to recallWe want a

pre-annotated

dataset

Slide12

aspirin

eliminates

headaches

aspirin

treats headaches Pre-annotated Dataset (Zeichner et al, 2012)

Usage:

compare relation inference to

recorded post-hoc evaluationLearn inference rules

using DIRT

Apply rules to

text

Open IE tuples

Annotate for entailment

Slide13

run onceaspirin

eliminates

headaches

aspirin treats headaches Pre-annotated Dataset (Zeichner

et al, 2012)

Usage:

compare relation inference to recorded post-hoc evaluationLearn inference rules

using DIRT

Apply rules to

text

Open IE tuples

Annotate for entailment

Slide14

run once Premise Hypothesis Label

Algo

aspirin

eliminates headaches aspirin treats headaches

 

Pre-annotated Dataset (

Zeichner et al, 2012)Usage: compare relation inference to

recorded

post-hoc evaluation

Learn inference rules

using DIRT

Apply rules to

text

Open IE tuples

Annotate for entailment

Compare new algorithms’ predictions to annotated data

Slide15

run once Premise Hypothesis Label

Algo

aspirin

eliminates headaches aspirin treats headaches 

Pre-annotated Dataset (

Zeichner

et al, 2012)Usage: compare relation inference to recorded post-hoc evaluation

Learn inference rules

using DIRT

Apply rules to

text

Open IE tuples

Annotate for entailment

Compare new algorithms’ predictions to annotated data

Slide16

Pre-annotated Dataset (Zeichner et al, 2012)

Usage: compare relation inference to recorded post-hoc evaluationProblems:

Expensive research cycle

Difficult to replicate

Oblivious to recallBiased towards DIRT

Slide17

How can we do better?

Slide18

Desired Qualities of Evaluation SchemeIntrinsic task

Pre-annotated datasetSensitive to recallNot biased towards a particular methodCrowdsourcableHigh quality labels

Slide19

Reformulating Relation Inference as

Question Answering

Slide20

Data Collection

Questions Which ingredient is included in chocolate?

Candidate Answers

chocolate is made from the cocoa bean

 

 

 

 

 

 

Premise

Hypothesis

Slide21

Data Collection: Questions

Existing QA datasetsTREC (Voorhees and Tice, 2000)WikiAnswers (Fader et al., 2013)WebQuestions (Berant et al., 2013)

Manually converted to “Which

?”

Who

climbed

the Everest

?

Which

person

climbed

the Everest

?

 

Slide22

Data Collection: Questions

Existing QA datasetsTREC (Voorhees and Tice, 2000)WikiAnswers (Fader et al., 2013)WebQuestions (Berant et al., 2013)

Manually converted to “Which

?”

Who

climbed

the Everest

?

Which

person

climbed

the Everest

?

 

key idea: naturally-occurring questions

Slide23

Data Collection: Candidate Answers

Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from the cocoa beanGiven a question “

Which

?”, fetch all assertions where:

 

Slide24

Data Collection: Candidate Answers

Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from the cocoa bean

Given a question “

Which

?”, fetch all assertions where:

One of the arguments is equal to

(

)

 

Slide25

Data Collection: Candidate Answers

Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from the cocoa

bean

Given a question “

Which

?”, fetch all assertions where:

One of the arguments is equal to

(

)

The other argument is a type of

(

is a

)

 

Slide26

Data Collection: Candidate Answers

Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from

the cocoa

bean

Given a question “Which

?”, fetch all assertions where:

One of the arguments is equal to

(

)

The other argument is a type of

(

is a

)

The relation is different from

(

)

 

Slide27

Data Collection: Candidate Answers

Extract Open IE assertions from Google’s Syntactic N-gramschocolate is made from

the cocoa

bean

Given a question “Which

?”, fetch all assertions where:

One of the arguments is equal to

(

)

The other argument is a type of

(

is a

)

The relation is different from

(

)

 

key idea: unbiased sample of relations

Slide28

Crowdsourced Annotation

Given 1 question + 20 matching candidate answersAnnotate each candidate answer as either: ✓ The sentence answers the question. ✗

The sentence does not answer the question.

? The sentence does not make sense, or is severely non-grammatical.

Slide29

Crowdsourced Annotation: Masking AnswersThe annotators are biased by their own world knowledge

Q: Which country borders Ethiopia?

A:

Eritrea

invaded Ethiopia

Slide30

Crowdsourced Annotation: Masking Answers

The annotators are biased by their own world knowledge Q: Which country borders

Ethiopia

?

A: Eritrea invaded Ethiopia Eritrea borders Ethiopia

✓ 

Slide31

Crowdsourced Annotation: Masking Answers

The annotators are biased by their own world knowledge Q: Which country borders

Ethiopia

?

A: Eritrea invaded Ethiopia Eritrea borders Ethiopia

A: Italy invaded Ethiopia

 

Slide32

Crowdsourced Annotation: Masking Answers

The annotators are biased by their own world knowledge Q: Which country borders

Ethiopia

?

A: Eritrea invaded Ethiopia Eritrea borders Ethiopia

A: Italy invaded Ethiopia

Italy

borders

Ethiopia

 

Slide33

Crowdsourced Annotation: Masking Answers

The annotators are biased by their own world knowledge Q: Which country borders

Ethiopia

?

A: Eritrea invaded Ethiopia Eritrea borders Ethiopia

A: Italy invaded Ethiopia

Italy

borders

Ethiopia

Filter world knowledge from annotation by

masking

the answer

 

Slide34

Crowdsourced Annotation: Masking Answers

The annotators are biased by their own world knowledge Q: Which country borders

Ethiopia

?

A: [COUNTRY] invaded EthiopiaFilter world knowledge from annotation by masking the answerSubstitute

with

 

Slide35

Crowdsourced Annotation

Slide36

Crowdsourced Annotation

No context switch

faster annotation!

 

Slide37

Crowdsourced Annotation: Aggregation

5 Mechanical Turk annotators per question-answer pairAt least 4/5 must agree on labelDiscard nonsensical/non-grammatical (

?

) examples

Labeled examples (after filtering): 16,371Agreement with expert: 90%

 

Slide38

Desired Qualities of Evaluation SchemeIntrinsic task

Pre-annotated dataset Sensitive to recall Not biased towards a particular method

Crowdsourcable

High quality labels

Slide39

Desired Qualities of Evaluation Scheme

Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recallNot biased towards a particular methodCrowdsourcable

High quality labels

(by definition)

Slide40

Desired Qualities of Evaluation Scheme

Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recall ✓Not biased towards a particular method

CrowdsourcableHigh quality labels(by definition)

(

)

 

Slide41

Desired Qualities of Evaluation Scheme

Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recall ✓Not biased towards a particular method

Crowdsourcable ✓ (16K examples)High quality labels(by definition)

(

)

 

Slide42

Desired Qualities of Evaluation Scheme

Intrinsic task ✓Pre-annotated dataset ✓Sensitive to recall ✓

Not biased towards a particular method

✓Crowdsourcable ✓ (16K examples)High quality labels ✓ (

vs expert)

 

(by definition)

(

)

 

Slide43

How many examples does post-hoc miss?

Slide44

Slide45

Conclusions

Slide46

Recap

Novel methodology for creating unbiased and natural datasetsKey idea: Reformulate

relation inference

as

question answeringA new benchmark for relation inference in contextEmpirical finding: current methods have very low coverage

Slide47

Going Forward

Data is publicly available! bit.ly/2aCLuLB Poses a new challengeCode is publicly available!

bit.ly/2b05EhK

Extend our methodology Larger datasets for supervised learning

Thank

you!