/
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Illinois-Coref: The UI System in the CoNLL-2012 Shared Task

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
343 views
Uploaded On 2019-11-02

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task - PPT Presentation

IllinoisCoref The UI System in the CoNLL2012 Shared Task KaiWei Chang Rajhans Samdani Alla Rozovskaya Mark Sammons and Dan Roth ID: 762123

mention mentions coreference system mentions mention system coreference resolution inference baseline pronominal conll features latent gold training set pred

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Illinois-Coref: The UI System in the CoN..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani , Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL, and by DARPA, under the Machine Reading Program Coreference Coreference Resolution is the task of grouping all the mentions of entities into equivalence classes so that each class represents a discourse entity. In the example below, the mentions are colour-coded to indicate which mentions are co-referent (overlapping mentions have been omitted for clarity). An American official announced that American President Bill Clinton met his Russian counterpart, Vladimir Putin, today. The president said that Russia was a great country. System Architecture CoNLL Shared Task 2012 Experiments and Results We present the performance of the system on both the OntoNotes-4.0 and OnotNote-5.0 datasets: Results on CoNLL-12 DEV set with pred. mentions Results on CoNLL-11 DEV set with pred. mentions Official scores on CoNLL-12 TEST set:We train the system on both TRAIN and DEV sets We described strategies for improving mention detection and proposed and online latent structured algorithm for coreference resolution. Using separate classifiers for coreference resolution on pronominal and non-pronominal mentions improves the system. The performance of our Chinese coreference system is not as good as that for English. It may because we use the same features for English and Chinese and the feature set is developed for the English Corpus. Overall, the system improves 5% MUC F1, 0.8% BCUB F1 and 1.7% CEAF F1 over baseline system. Method MD MUC BCUB CEAF AVG Baseline70.5361.6369.2643.0357.97+Sep. Pronouns73.2464.5769.7844.9559.76+Latent Structure73.0264.9870.0044.4859.82Final System (+Both)73.9565.7570.2545.3060.43 Method MUC BCUB CEAF AVG Bestline 55.4969.1543.7256.12New System60.6569.9545.3958.66 English (Pred. Mentions)MD MUC BCUB CEAF AVG English (Pred. Mentions)74.3266.3869.3444.8160.18English (Gold Mention Boundaries)75.7267.8069.7545.1260.89English (Gold Mentions)10085.7477.4668.4677.22Chinese (Pred Mentions)47.5837.9363.2335.9745.71 Discussions and Conclusions The CoNLL Shared Task 2012 is an extension of the last year’s coreference task. We use the Illinois-Coref system from CoNLL-2011 as the basis for our current system. We improve over the baseline system by: Improving mention detection. Using separate models for coreference resolution on pronominal and non-pronominal mentions. Designing a latent structured learning protocol for Bestlink inference Pronoun Resolution Learning Protocol for Best-Link Baseline uses an identical coreference model on both pronominal and non-pronominal mentions However, features for coreference resolution on pronouns and non-pronouns are usually different.Lexical features are more important for non- pronominal coreference resolution.Gender features are more important for pronoun anaphora resolution.New System: Use separate models Training two separate classifiers with different sets of features for pronoun and non-pronoun coreference. The pronoun anaphora classifier includes the features to identify the speaker and to reflect the document type. We investigate two types of learning protocol for Baselink inference.Baseline: Applying Binary Classification:Baseline system applies the strategy in (Bengtson and Roth, 2008) to learn the pairwise scoring function w on: Positive examples: for each mention u, we construct a positive example (u, v), where v is the closest preceding mention in u’s equivalence class. Negative examples: all mention pairs (u, v), where v is a preceding mention of u and u, v are in different classes.Drawbacks: Suffer from a severe label imbalance problem. Does not relate well to the Bestlink inference.E.g., consider tree mention belonging to the same cluster:{m1: “President Bush”, m2: “he”, m3:“George Bush”}. Positive pair chosen by baseline system: (m2, m3). Better choice: (m1, m3).New System: Latent Structured Learning:We consider the latent structured learning algorithm: Input: mention v and its preceding mentions {u | u < v }. Output: y(v) : the set of antecedents that co-refer with v. h(v):a latent structure that denotes the Bestlink decision the loss function: At each iteration, the Algorithm takes the following steps:Pick a mention v and finds the Bestlink decision u* that is consistent with the gold cluster.Pick the Bestlink decision u’ with the current model by solving a loss-augmented inference.Update the model by the difference between the features vectors Á(u’, v) - Á(u*,v). Best-Link Inference Procedure The inference procedure takes as input a set of pairwise mention scores over a document and aggregates them into globally consistent cliques representing entities. Best-Link Inference: For each mention, Best-Link considers the best mention on its left to connect to (best according the pairwise score) and creates a link between them if the pairwise score is above some threshold. Improving on Mention Detection Baseline system implements a high recall and low precision rule-based system. The error analysis shows that there are two main errors: Incorrect Mention Boundary There are two main reasons for boundary errors: Parse Mistakes: May be due to a wrong attachment or adding extra words to a mention. (e.g., if the parser attaches the relative clause “President Bush, who traveled to China yesterday” to a different noun. The system may predict “President Bush” as a mention. Solutions: (We can only fix a subset of such errors) Rule-based: Fix mentions starting with a stop word and mentions ending with a punctuation mark. Use training data to learn patterns of inappropriate mention boundaries. Annotation Inconsistency: a punctuation mark or an apostrophe used to mark the possessive form are inconsistently added to the end of mentions. Solutions: In training phrase, perform a relaxed matching between predicted mentions and gold mentions. We cannot fix this problem in test phrase. Non-referential Noun Phrases Candidate noun phrases that are unlikely to refer to any entity in the real world. (E.g., “the same time”). They are not considered as mentions in Ontonotes. Solutions: Use the statistics in training data. Count the number of times that a candidate mention happens to be a gold mention in the training data. Remove candidate mentions that frequently appear in the training data but never appear as gold mentions. Can also take the predicted head word and the words before and after the mention into account. (e.g., help to remove the noun “fact” in the phrase “in fact”).