Centric Coreference Resolution with Model Stacking Kevin Clark and Christopher D Manning ACLIJCNLP 2015 Tables are taken from the abovementioned paper Presented by Mamoru Komachi ltkomachitmuacjpgt ID: 140459
Download Presentation The PPT/PDF document "Entity" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Entity Centric Coreference Resolution with Model Stacking
Kevin Clark and Christopher D. Manning
(ACL-IJCNLP 2015)
(Tables are taken from the above-mentioned paper)
Presented by Mamoru Komachi
<komachi@tmu.ac.jp>
ACL 2015 Reading Group @ Tokyo Institute of Technology
August 26
th
, 2015Slide2
Entity-level information allows early
coreference decisions to inform later ones
Entity-centric coreference systems build up coreference clusters incrementally (Raghunathan et al., 2010; Stoyanov and Eisner, 2012; Ma et al., 2014)
2
Hillary Clinton files for divorce from Bill Clinton ahead of her campaign for presidency for 2016.….Clinton is confident that her poll numbers will skyrocket once the divorce is final.
?
!
?Slide3
Problem: How to build up clusters effectively?Model stacking
Two mention pair models: classification model and ranking model
Generates clusters features for clusters of mentions
Imitation learningAssigns exact costs to actions based on coreference evaluation metricsUses the scores of the pairwise models to reduce the search space3Slide4
Mention Pair Models
Previous
approach
using local information4Slide5
Two models for predicting whether a given
pair of mentions belong to the same coreference cluster
Is it a
coreferent?Classification modelWhich one best suites for the mention?
Ranking model
5Bill arrived, but nobody saw him.I talked to him on the phone.Slide6
Logistic classifiers for classification model
M
: set of all mentions in the training set
T(m): set of true antecedents of a mention mF(m
): set of false antecedents of mConsiders each pair of mentions independently
6Slide7
Logistic classifiers for ranking
model
Considers candidate antecedents simultaneouslyMax-margin training encourages the model to find the single best antecedent for a mention, but it is not robust for a downstream clustering model
7Slide8
Features for mention pair
model
Distance features: the distance between the two mentions in sentences or number of mentionsSyntactic features: number of embedded NPs under a mention, POS tags of the first, last, and head wordSemantic features: named entity type, speaker identificationRule-based features: exact and partial string matchingLexical features: the first, last, and head word of the current mention
8Slide9
Entity-Centric Coreference ModelProposed approach using cluster features
9Slide10
Entity-centric model can exhibit high coherencyBest first
clustering (Ng
and Cardie, 2002)Assigns the
most probable preceding mention classified as
coreferent with it as the antecedentOnly relies on local informationEntity-centric model (this work)Operates between pairs of clusters instead of pairs of mentionsBuilds up coreference chains with agglomerative clustering, by merging clusters if it predicts they are representing the same one10Slide11
InferenceReducing the search space by using a threshold from mention-pair models
Sort
P to
perform easy-first clusterings is a scoring function to make a binary decision for merge action
11Slide12
Learning entity-centric model
by
imitation
learningSequential prediction problem: future observations depend on previous actionsImitation learning (in this work, DAgger (Ross al., 2011)), is useful for this problem (Argall et al., 2009)
Training the agent on the gold labels alone assumes that all previous decisions were correct, but it is problematic in coreference, where the error rate is quite highDAgger
exposes the system to states at train time similar to the ones it will face at test time12Slide13
Learning cluster merging policy
by DAgger
(Ross et al., 2011)Iterative algorithm aggregating a dataset D consisting of states and the actions performed by the expert policy in those states
b controls the probability of the expert’s policy and current policy (decays exponentially as the iteration number increases)
13Slide14
Adding cost to actions: Directly
tune to optimize coreference
metricsMerging clusters (order of merge operations is also important) influence the scoreHow a
particular local decision
will affect the final score of the coreference system?Problem: standard coreference metrics do not decompose into clustersAnswer: rolling out the actions from the current state14
A(
s): set of actions that can be taken from the state sSlide15
Cluster features for classification model and ranking model
Between
clusters features
Minimum and maximum probability of coreferenceAverage probability and average log prob. of coreferenceAverage probability and log probability of coreference for a particular pair of grammar types of mentions (pron or not)
15Slide16
Only 56 features for entity
-
centric model
State featuresWhether a preceding mention pair in the list of mention pairs has the same candidate anaphor as the current oneThe index of the current mention pair in the list divided by the size of the list (what percentage of the list have we seen so far?)…
Entity-centric model doesn’t rely on sparse lexical features. Instead, it employs model stacking to exploit strong features (with scores learned from pairwise model)
16Slide17
Results and discussionsCoNLL 2012 English
coreference
task
17Slide18
Experimental setup: CoNLL 2012 Shared Task
English portion of
OntoNotes
Training: 2802, development: 343, test:345 documentsUse the provided pre-processing (parse trees, NE, etc)Common evaluation metricsMUC, B
3, CEAFECoNLL
F1 (the average F1 score of the three metrics)CoNLL scorer version 8.01Rule-based mention detection (Raghunathan et al., 2010)18Slide19
Results: Entity-centric model outperforms best-first clustering in both classification and ranking
19Slide20
Entity-centric model beats other state-of-the-art coreference models
20
This work primarily optimize for B
3 metric during trainingState-of-the-art
systems use latent antecedents to
learn scoring functions over mention pairs, but are trained to maximize global objective functionsSlide21
Entity-centric model directly learns a coreference model that maximizes an evaluation metricPost-processing of mention
pair and ranking models
Closest-first clustering (Soon et al., 2001)Best-first clustering (Ng and
Cardie, 2002)Global inference modelsGlobal inference with integer linear programming (Denis and Baldridge, 2007; Finkel
and Manning, 2008)Graph partitioning (McCallum and Wellner, 2005;
Nicolae and Nicolae, 2006)Correlational clustering (McCallum and Wellner, 2003; Finely and Joachims, 2005)21Slide22
Previous approaches do not directly tune against coreference metricsNon-local entity-level information
Cluster model (
Luo et al., 2004; Yang et al., 2008; Rahman and Ng, 2011)
Joint inference (McCallum and Wellner, 2003; Culotta et al., 2006; Poon and Domingos, 2008; Haghighi and Klein, 2010)Learning trajectories of decisions
Imitation learning (Daume
et al., 2005; Ma et al., 2014)Structured perceptron (Stoyanov and Eisner, 2012; Fernandes et al., 2012; Bjoerkelund and Kuhn, 2014)22Slide23
SummaryProposed an entity-centric
coreference model using
the scores produced by mention pair models as featuresPairwise scores are learned using standard coreference metricsImitation learning can be used to learn how to build up coreference chains incrementallyProposed model outperforms the commonly used best-first method and current state-of-the-art
23