Knowledge and Tree-Edits in Learnable Entailment - PowerPoint Presentation

400 views
Uploaded On 2016-04-06

Knowledge and Tree-Edits in Learnable Entailment - PPT Presentation

Proofs Asher Stern Amnon Lotan Shachar Mirkin Eyal Shnarch Lili Kotlerman Jonathan Berant and Ido Dagan TAC November 2011 NIST Gaithersburg Maryland USA ID: 275312

police proof feature boy proof police boy feature vector operation child learning cost located tree proofs find based model

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/275312" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Knowledge and Tree-Edits in Learnable En..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Knowledge and Tree-Edits in Learnable Entailment

Proofs

Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido DaganTACNovember 2011, NIST, Gaithersburg, Maryland, USADownload at: http://www.cs.biu.ac.il/~nlp/downloads/biutee

BIUTEESlide2

RTE

Classify a (T,H)

pair asENTAILING or NON-ENTAILING2T: The boy was located by the police.H: Eventually, the police found the child.ExampleSlide3

Matching vs. Transformations

Matching

Sequence of transformations (A proof)Tree-EditsComplete proofsEstimate confidenceKnowledge based Entailment RulesLinguistically motivatedFormalize many types of knowledge3T = T0 → T1 → T2 → ... → Tn = HSlide4

Transformation based RTE - Example

T = T

0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.Hypothesis: Eventually, the police found the child.4Slide5

Transformation based RTE - Example

T = T

0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.The police located the boy.The police found the boy.The police found the child.Hypothesis: Eventually, the police found the child.

5Slide6

Transformation based RTE - Example

T = T

0 → T1 → T2 → ... → Tn = H6Slide7

BIUTEE Goals

Tree EditsComplete proofs

Estimate confidenceEntailment RulesLinguistically motivatedFormalize many types of knowledgeBIUTEEIntegrates the benefits of both worlds7Slide8

Challenges / System Components

generate linguistically motivated complete proofs?

estimate proof confidence?find the best proof?learn the model parameters?How to8Slide9

1. Generate linguistically motivated complete proofs

9Slide10

Entailment Rules

boy

childGeneric SyntacticLexical SyntacticLexicalBar-Haim et al. 2007. Semantic inference at the lexical-syntactic level. Slide11

Extended Tree Edits (On The Fly Operations)

Predefined custom tree editsInsert node on the fly

Move node / move sub-tree on the flyFlip part of speech…Heuristically capture linguistic phenomenaOperation definitionFeatures definition11Slide12

Proof over Parse Trees - Example

T = T

0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.Passive to activeThe police located the boy.X locate Y  X find YThe police found the boy.Boy  childThe police found the child.Insertion on the flyHypothesis: Eventually, the police found the child.12Slide13

2. Estimate proof confidence

13Slide14

Cost based Model

Define operation costAssesses operation’s validityRepresent each operation as a feature vectorCost is linear combination of feature values

Define proof cost as the sum of the operations’ costsClassify: entailment if and only if proof cost is smaller than a threshold14Slide15

Feature vector representation

Define operation costRepresent each operation as a feature vector

Features (Insert-Named-Entity, Insert-Verb, … , WordNet, Lin, DIRT, …)The police located the boy.DIRT: X locate Y  X find Y (score = 0.9)The police found the boy.(0,0,…,0.457,…,0)(0 ,0,…,0,…,0)Feature vector that represents the operation15An operationA downward function of scoreSlide16

Cost based Model

Define operation costCost is linear combination of feature values

Cost = weight-vector * feature-vectorWeight-vector is learned automatically16Slide17

Confidence Model

Define operation costRepresent each operation as a feature vector

Define proof cost as the sum of the operations’ costsCost of proofWeight vectorVector represents the proof.DefineSlide18

Feature vector representation - example

T = T

0 → T1 → T2 → ... → Tn = H(0,0,……………….………..,1,0)(0,0,………..……0.457,..,0,0)(0,0,..…0.5,.……….……..,0,0)(0,0,1,……..…….…..…....,0,0)(0,0,1..0.5..…0.457,....…1,0)+++=18Text:

The boy was located by the police.

Passive to active

The police located the boy.

X locate Y



X find Y

The police found the boy.

Boy



child

The police found the child.

Insertion on the fly

Hypothesis:

Eventually, the police found the child.Slide19

Cost based Model

Define operation cost

Represent each operation as a feature vectorDefine proof cost as the sum of the operations’ costsClassify: “entailing” if and only if proof cost is smaller than a threshold19LearnSlide20

3. Find the best proof

20Slide21

Search the best proof

T HProof #1Proof #2Proof #3Proof #4Slide22

Search the best proof

Need to find the “best” proof“Best Proof” = proof with lowest costAssuming a weight vector is givenSearch space is exponentialAI style search algorithmProof #1Proof #2Proof #3Proof #4T  HProof #1Proof #2Proof #3Proof #4T  HSlide23

4. Learn model parameters

23Slide24

Learning

Goal: Learn parameters (w, b)Use a linear learning algorithm

logistic regression, SVM, etc.24Slide25

Inference vs. Learning

Training samplesVector representationLearning algorithmw,bBest ProofsFeature extractionFeature extractionSlide26

Inference vs. Learning

Training samplesVector representationLearning algorithmw,bBest ProofsFeature extractionSlide27

Iterative Learning Scheme

Training samplesVector representationLearning algorithmw,bBest Proofs1. W=reasonable guess2. Find the best proofs3. Learn new w and b4. Repeat to step 2Slide28

Summary- System Components

Generate syntactically motivated complete proofs?

Entailment rulesOn the fly operations (Extended Tree Edit Operations)Estimate proof validity?Confidence ModelFind the best proof?Search AlgorithmLearn the model parameters?Iterative Learning SchemeHow to28Slide29

Results RTE7

IDKnowledge ResourcesPrecision %Recall %F1 %BIU1WordNet, Directional Similarity38.9747.4042.77BIU2WordNet, Directional Similarity, Wikipedia41.8144.1142.93BIU3WordNet, Directional Similarity, Wikipedia, FrameNet, Geographical database39.2645.9542.34BIUTEE 2011 on RTE 6 (F1 %)Base line (Use IR top-5 relevance)34.63Median (September 2010)36.14Best (September 2010)48.01Our system49.54Slide30

Conclusions

Inference via sequence of transformationsKnowledge

Extended Tree EditsProof confidence estimationResultsBetter than median on RTE7Best on RTE6Open Source30http://www.cs.biu.ac.il/~nlp/downloads/biuteeSlide31

Thank You

http

://www.cs.biu.ac.il/~nlp/downloads/biutee