Proofs Asher Stern Amnon Lotan Shachar Mirkin Eyal Shnarch Lili Kotlerman Jonathan Berant and Ido Dagan TAC November 2011 NIST Gaithersburg Maryland USA ID: 783348
Download The PPT/PDF document "Knowledge and Tree-Edits in Learnable En..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Knowledge and Tree-Edits in Learnable Entailment
Proofs
Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido DaganTACNovember 2011, NIST, Gaithersburg, Maryland, USADownload at: http://www.cs.biu.ac.il/~nlp/downloads/biutee
BIUTEE
Slide2RTE
Classify a (T,H)
pair asENTAILING or NON-ENTAILING2T: The boy was located by the police.H: Eventually, the police found the child.Example
Slide3Matching vs. Transformations
Matching
Sequence of transformations (A proof)Tree-EditsComplete proofsEstimate confidenceKnowledge based Entailment RulesLinguistically motivatedFormalize many types of knowledge3T = T0 → T1 → T2 → ... → Tn = H
Slide4Transformation based RTE - Example
T = T
0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.Hypothesis: Eventually, the police found the child.4
Slide5Transformation based RTE - Example
T = T
0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.The police located the boy.The police found the boy.The police found the child.Hypothesis: Eventually, the police found the child.
5
Slide6Transformation based RTE - Example
T = T
0 → T1 → T2 → ... → Tn = H6
Slide7BIUTEE Goals
Tree EditsComplete proofs
Estimate confidenceEntailment RulesLinguistically motivatedFormalize many types of knowledgeBIUTEEIntegrates the benefits of both worlds7
Slide8Challenges / System Components
generate linguistically motivated complete proofs?
estimate proof confidence?find the best proof?learn the model parameters?How to8
Slide91. Generate linguistically motivated complete proofs
9
Slide10Entailment Rules
boy
childGeneric SyntacticLexical SyntacticLexicalBar-Haim et al. 2007. Semantic inference at the lexical-syntactic level.
Slide11Extended Tree Edits (On The Fly Operations)
Predefined custom tree editsInsert node on the fly
Move node / move sub-tree on the flyFlip part of speech…Heuristically capture linguistic phenomenaOperation definitionFeatures definition11
Slide12Proof over Parse Trees - Example
T = T
0 → T1 → T2 → ... → Tn = HText: The boy was located by the police.Passive to activeThe police located the boy.X locate Y X find YThe police found the boy.Boy childThe police found the child.Insertion on the flyHypothesis: Eventually, the police found the child.12
Slide132. Estimate proof confidence
13
Slide14Cost based Model
Define operation costAssesses operation’s validityRepresent each operation as a feature vectorCost is linear combination of feature values
Define proof cost as the sum of the operations’ costsClassify: entailment if and only if proof cost is smaller than a threshold14
Slide15Feature vector representation
Define operation costRepresent each operation as a feature vector
Features (Insert-Named-Entity, Insert-Verb, … , WordNet, Lin, DIRT, …)The police located the boy.DIRT: X locate Y X find Y (score = 0.9)The police found the boy.(0,0,…,0.457,…,0)(0 ,0,…,0,…,0)Feature vector that represents the operation15An operationA downward function of score
Slide16Cost based Model
Define operation costCost is linear combination of feature values
Cost = weight-vector * feature-vectorWeight-vector is learned automatically16
Slide17Confidence Model
Define operation costRepresent each operation as a feature vector
Define proof cost as the sum of the operations’ costsCost of proofWeight vectorVector represents the proof.Define
Slide18Feature vector representation - example
T = T
0 → T1 → T2 → ... → Tn = H(0,0,……………….………..,1,0)(0,0,………..……0.457,..,0,0)(0,0,..…0.5,.……….……..,0,0)(0,0,1,……..…….…..…....,0,0)(0,0,1..0.5..…0.457,....…1,0)+++=18Text:
The boy was located by the police.
Passive to active
The police located the boy.
X locate Y
X find Y
The police found the boy.
Boy
child
The police found the child.
Insertion on the fly
Hypothesis:
Eventually, the police found the child.
Slide19Cost based Model
Define operation cost
Represent each operation as a feature vectorDefine proof cost as the sum of the operations’ costsClassify: “entailing” if and only if proof cost is smaller than a threshold19Learn
Slide203. Find the best proof
20
Slide21Search the best proof
21
T HProof #1Proof #2Proof #3Proof #4
Slide22Search the best proof
22
Need to find the “best” proof“Best Proof” = proof with lowest costAssuming a weight vector is givenSearch space is exponentialAI style search algorithmProof #1Proof #2Proof #3Proof #4T HProof #1Proof #2Proof #3Proof #4T H
Slide234. Learn model parameters
23
Slide24Learning
Goal: Learn parameters (w, b)Use a linear learning algorithm
logistic regression, SVM, etc.24
Slide2525
Inference vs. Learning
Training samplesVector representationLearning algorithmw,bBest ProofsFeature extractionFeature extraction
Slide2626
Inference vs. Learning
Training samplesVector representationLearning algorithmw,bBest ProofsFeature extraction
Slide2727
Iterative Learning Scheme
Training samplesVector representationLearning algorithmw,bBest Proofs1. W=reasonable guess2. Find the best proofs3. Learn new w and b4. Repeat to step 2
Slide28Summary- System Components
Generate syntactically motivated complete proofs?
Entailment rulesOn the fly operations (Extended Tree Edit Operations)Estimate proof validity?Confidence ModelFind the best proof?Search AlgorithmLearn the model parameters?Iterative Learning SchemeHow to28
Slide29Results RTE7
29
IDKnowledge ResourcesPrecision %Recall %F1 %BIU1WordNet, Directional Similarity38.9747.4042.77BIU2WordNet, Directional Similarity, Wikipedia41.8144.1142.93BIU3WordNet, Directional Similarity, Wikipedia, FrameNet, Geographical database39.2645.9542.34BIUTEE 2011 on RTE 6 (F1 %)Base line (Use IR top-5 relevance)34.63Median (September 2010)36.14Best (September 2010)48.01Our system49.54
Slide30Conclusions
Inference via sequence of transformationsKnowledge
Extended Tree EditsProof confidence estimationResultsBetter than median on RTE7Best on RTE6Open Source30http://www.cs.biu.ac.il/~nlp/downloads/biutee
Slide31Thank You
http
://www.cs.biu.ac.il/~nlp/downloads/biutee