Proofs Asher Stern Amnon Lotan Shachar Mirkin Eyal Shnarch Lili Kotlerman Jonathan Berant and Ido Dagan TAC November 2011 NIST Gaithersburg Maryland USA ID: 275312
Download Presentation The PPT/PDF document "Knowledge and Tree-Edits in Learnable En..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Knowledge and Tree-Edits in Learnable Entailment
Proofs
Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido DaganTACNovember 2011, NIST, Gaithersburg, Maryland, USADownload at: http://www.cs.biu.ac.il/~nlp/downloads/biutee
BIUTEESlide2
RTE
Classify a (T,H)
pair asENTAILING or NON-ENTAILING2T: The boy was located by the police.H: Eventually, the police found the child.ExampleSlide3
Matching vs. Transformations
Matching
Sequence of transformations (A proof)Tree-EditsComplete proofsEstimate confidenceKnowledge based Entailment RulesLinguistically motivatedFormalize many types of knowledge3T = T0 → T1 → T2 → ... → Tn = HSlide4
Transformation based RTE - Example
T = T
0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.Hypothesis: Eventually, the police found the child.4Slide5
Transformation based RTE - Example
T = T
0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.The police located the boy.The police found the boy.The police found the child.Hypothesis: Eventually, the police found the child.
5Slide6
Transformation based RTE - Example
T = T
0 → T1 → T2 → ... → Tn = H6Slide7
BIUTEE Goals
Tree EditsComplete proofs
Estimate confidenceEntailment RulesLinguistically motivatedFormalize many types of knowledgeBIUTEEIntegrates the benefits of both worlds7Slide8
Challenges / System Components
generate linguistically motivated complete proofs?
estimate proof confidence?find the best proof?learn the model parameters?How to8Slide9
1. Generate linguistically motivated complete proofs
9Slide10
Entailment Rules
boy
childGeneric SyntacticLexical SyntacticLexicalBar-Haim et al. 2007. Semantic inference at the lexical-syntactic level. Slide11
Extended Tree Edits (On The Fly Operations)
Predefined custom tree editsInsert node on the fly
Move node / move sub-tree on the flyFlip part of speech…Heuristically capture linguistic phenomenaOperation definitionFeatures definition11Slide12
Proof over Parse Trees - Example
T = T
0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.Passive to activeThe police located the boy.X locate Y X find YThe police found the boy.Boy childThe police found the child.Insertion on the flyHypothesis: Eventually, the police found the child.12Slide13
2. Estimate proof confidence
13Slide14
Cost based Model
Define operation costAssesses operation’s validityRepresent each operation as a feature vectorCost is linear combination of feature values
Define proof cost as the sum of the operations’ costsClassify: entailment if and only if proof cost is smaller than a threshold14Slide15
Feature vector representation
Define operation costRepresent each operation as a feature vector
Features (Insert-Named-Entity, Insert-Verb, … , WordNet, Lin, DIRT, …)The police located the boy.DIRT: X locate Y X find Y (score = 0.9)The police found the boy.(0,0,…,0.457,…,0)(0 ,0,…,0,…,0)Feature vector that represents the operation15An operationA downward function of scoreSlide16
Cost based Model
Define operation costCost is linear combination of feature values
Cost = weight-vector * feature-vectorWeight-vector is learned automatically16Slide17
Confidence Model
Define operation costRepresent each operation as a feature vector
Define proof cost as the sum of the operations’ costsCost of proofWeight vectorVector represents the proof.DefineSlide18
Feature vector representation - example
T = T
0 → T1 → T2 → ... → Tn = H(0,0,……………….………..,1,0)(0,0,………..……0.457,..,0,0)(0,0,..…0.5,.……….……..,0,0)(0,0,1,……..…….…..…....,0,0)(0,0,1..0.5..…0.457,....…1,0)+++=18Text:
The boy was located by the police.
Passive to active
The police located the boy.
X locate Y
X find Y
The police found the boy.
Boy
child
The police found the child.
Insertion on the fly
Hypothesis:
Eventually, the police found the child.Slide19
Cost based Model
Define operation cost
Represent each operation as a feature vectorDefine proof cost as the sum of the operations’ costsClassify: “entailing” if and only if proof cost is smaller than a threshold19LearnSlide20
3. Find the best proof
20Slide21
Search the best proof
21
T HProof #1Proof #2Proof #3Proof #4Slide22
Search the best proof
22
Need to find the “best” proof“Best Proof” = proof with lowest costAssuming a weight vector is givenSearch space is exponentialAI style search algorithmProof #1Proof #2Proof #3Proof #4T HProof #1Proof #2Proof #3Proof #4T HSlide23
4. Learn model parameters
23Slide24
Learning
Goal: Learn parameters (w, b)Use a linear learning algorithm
logistic regression, SVM, etc.24Slide25
25
Inference vs. Learning
Training samplesVector representationLearning algorithmw,bBest ProofsFeature extractionFeature extractionSlide26
26
Inference vs. Learning
Training samplesVector representationLearning algorithmw,bBest ProofsFeature extractionSlide27
27
Iterative Learning Scheme
Training samplesVector representationLearning algorithmw,bBest Proofs1. W=reasonable guess2. Find the best proofs3. Learn new w and b4. Repeat to step 2Slide28
Summary- System Components
Generate syntactically motivated complete proofs?
Entailment rulesOn the fly operations (Extended Tree Edit Operations)Estimate proof validity?Confidence ModelFind the best proof?Search AlgorithmLearn the model parameters?Iterative Learning SchemeHow to28Slide29
Results RTE7
29
IDKnowledge ResourcesPrecision %Recall %F1 %BIU1WordNet, Directional Similarity38.9747.4042.77BIU2WordNet, Directional Similarity, Wikipedia41.8144.1142.93BIU3WordNet, Directional Similarity, Wikipedia, FrameNet, Geographical database39.2645.9542.34BIUTEE 2011 on RTE 6 (F1 %)Base line (Use IR top-5 relevance)34.63Median (September 2010)36.14Best (September 2010)48.01Our system49.54Slide30
Conclusions
Inference via sequence of transformationsKnowledge
Extended Tree EditsProof confidence estimationResultsBetter than median on RTE7Best on RTE6Open Source30http://www.cs.biu.ac.il/~nlp/downloads/biuteeSlide31
Thank You
http
://www.cs.biu.ac.il/~nlp/downloads/biutee