/
Entity Entity

Entity - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
416 views
Uploaded On 2015-09-25

Entity - PPT Presentation

Centric Coreference Resolution with Model Stacking Kevin Clark and Christopher D Manning ACLIJCNLP 2015 Tables are taken from the abovementioned paper Presented by Mamoru Komachi ltkomachitmuacjpgt ID: 140459

coreference model entity mention model coreference mention entity features centric pair clusters learning clustering state current models ranking mentions

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Entity" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Entity Centric Coreference Resolution with Model Stacking

Kevin Clark and Christopher D. Manning

(ACL-IJCNLP 2015)

(Tables are taken from the above-mentioned paper)

Presented by Mamoru Komachi

<komachi@tmu.ac.jp>

ACL 2015 Reading Group @ Tokyo Institute of Technology

August 26

th

, 2015Slide2

Entity-level information allows early

coreference decisions to inform later ones

Entity-centric coreference systems build up coreference clusters incrementally (Raghunathan et al., 2010; Stoyanov and Eisner, 2012; Ma et al., 2014)

2

Hillary Clinton files for divorce from Bill Clinton ahead of her campaign for presidency for 2016.….Clinton is confident that her poll numbers will skyrocket once the divorce is final.

?

!

?Slide3

Problem: How to build up clusters effectively?Model stacking

Two mention pair models: classification model and ranking model

Generates clusters features for clusters of mentions

Imitation learningAssigns exact costs to actions based on coreference evaluation metricsUses the scores of the pairwise models to reduce the search space3Slide4

Mention Pair Models

Previous

approach

using local information4Slide5

Two models for predicting whether a given

pair of mentions belong to the same coreference cluster

Is it a

coreferent?Classification modelWhich one best suites for the mention?

Ranking model

5Bill arrived, but nobody saw him.I talked to him on the phone.Slide6

Logistic classifiers for classification model

M

: set of all mentions in the training set

T(m): set of true antecedents of a mention mF(m

): set of false antecedents of mConsiders each pair of mentions independently

6Slide7

Logistic classifiers for ranking

model

Considers candidate antecedents simultaneouslyMax-margin training encourages the model to find the single best antecedent for a mention, but it is not robust for a downstream clustering model

7Slide8

Features for mention pair

model

Distance features: the distance between the two mentions in sentences or number of mentionsSyntactic features: number of embedded NPs under a mention, POS tags of the first, last, and head wordSemantic features: named entity type, speaker identificationRule-based features: exact and partial string matchingLexical features: the first, last, and head word of the current mention

8Slide9

Entity-Centric Coreference ModelProposed approach using cluster features

9Slide10

Entity-centric model can exhibit high coherencyBest first

clustering (Ng

and Cardie, 2002)Assigns the

most probable preceding mention classified as

coreferent with it as the antecedentOnly relies on local informationEntity-centric model (this work)Operates between pairs of clusters instead of pairs of mentionsBuilds up coreference chains with agglomerative clustering, by merging clusters if it predicts they are representing the same one10Slide11

InferenceReducing the search space by using a threshold from mention-pair models

Sort

P to

perform easy-first clusterings is a scoring function to make a binary decision for merge action

11Slide12

Learning entity-centric model

by

imitation

learningSequential prediction problem: future observations depend on previous actionsImitation learning (in this work, DAgger (Ross al., 2011)), is useful for this problem (Argall et al., 2009)

Training the agent on the gold labels alone assumes that all previous decisions were correct, but it is problematic in coreference, where the error rate is quite highDAgger

exposes the system to states at train time similar to the ones it will face at test time12Slide13

Learning cluster merging policy

by DAgger

(Ross et al., 2011)Iterative algorithm aggregating a dataset D consisting of states and the actions performed by the expert policy in those states

b controls the probability of the expert’s policy and current policy (decays exponentially as the iteration number increases)

13Slide14

Adding cost to actions: Directly

tune to optimize coreference

metricsMerging clusters (order of merge operations is also important) influence the scoreHow a

particular local decision

will affect the final score of the coreference system?Problem: standard coreference metrics do not decompose into clustersAnswer: rolling out the actions from the current state14

A(

s): set of actions that can be taken from the state sSlide15

Cluster features for classification model and ranking model

Between

clusters features

Minimum and maximum probability of coreferenceAverage probability and average log prob. of coreferenceAverage probability and log probability of coreference for a particular pair of grammar types of mentions (pron or not)

15Slide16

Only 56 features for entity

-

centric model

State featuresWhether a preceding mention pair in the list of mention pairs has the same candidate anaphor as the current oneThe index of the current mention pair in the list divided by the size of the list (what percentage of the list have we seen so far?)…

Entity-centric model doesn’t rely on sparse lexical features. Instead, it employs model stacking to exploit strong features (with scores learned from pairwise model)

16Slide17

Results and discussionsCoNLL 2012 English

coreference

task

17Slide18

Experimental setup: CoNLL 2012 Shared Task

English portion of

OntoNotes

Training: 2802, development: 343, test:345 documentsUse the provided pre-processing (parse trees, NE, etc)Common evaluation metricsMUC, B

3, CEAFECoNLL

F1 (the average F1 score of the three metrics)CoNLL scorer version 8.01Rule-based mention detection (Raghunathan et al., 2010)18Slide19

Results: Entity-centric model outperforms best-first clustering in both classification and ranking

19Slide20

Entity-centric model beats other state-of-the-art coreference models

20

This work primarily optimize for B

3 metric during trainingState-of-the-art

systems use latent antecedents to

learn scoring functions over mention pairs, but are trained to maximize global objective functionsSlide21

Entity-centric model directly learns a coreference model that maximizes an evaluation metricPost-processing of mention

pair and ranking models

Closest-first clustering (Soon et al., 2001)Best-first clustering (Ng and

Cardie, 2002)Global inference modelsGlobal inference with integer linear programming (Denis and Baldridge, 2007; Finkel

and Manning, 2008)Graph partitioning (McCallum and Wellner, 2005;

Nicolae and Nicolae, 2006)Correlational clustering (McCallum and Wellner, 2003; Finely and Joachims, 2005)21Slide22

Previous approaches do not directly tune against coreference metricsNon-local entity-level information

Cluster model (

Luo et al., 2004; Yang et al., 2008; Rahman and Ng, 2011)

Joint inference (McCallum and Wellner, 2003; Culotta et al., 2006; Poon and Domingos, 2008; Haghighi and Klein, 2010)Learning trajectories of decisions

Imitation learning (Daume

et al., 2005; Ma et al., 2014)Structured perceptron (Stoyanov and Eisner, 2012; Fernandes et al., 2012; Bjoerkelund and Kuhn, 2014)22Slide23

SummaryProposed an entity-centric

coreference model using

the scores produced by mention pair models as featuresPairwise scores are learned using standard coreference metricsImitation learning can be used to learn how to build up coreference chains incrementallyProposed model outperforms the commonly used best-first method and current state-of-the-art

23