Ying Lin 1 Heng Ji 1 Fei Huang 2 Lingfei Wu 3 1 University of Illinois at UrbanaChampaign 2 Alibaba DAMO Academy 3 IBM Research Motivation Pipelined models suffer from the error propagation problem and disallow interactions among components ID: 816256
Download The PPT/PDF document "Joint Neural Model for Information Extra..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Joint Neural Model for Information Extraction with Global Features
Ying Lin1, Heng Ji1, Fei Huang2, Lingfei Wu31University of Illinois at Urbana-Champaign2Alibaba DAMO Academy 3IBM Research
Slide2Motivation
Pipelined models suffer from the error propagation problem and disallow interactions among components.Existing neural models do not explicitly model cross-subtask and cross-instance interactions among knowledge elements.Example: Prime Minister Abdullah Gul resigned earlier Tuesday to make way for Erdogan, who won a parliamentary seat in by-elections Sunday.2PER
person
Elect
End-Position
Erdogan
resigned
won
person
Abdullah Gul
PER
person
PER
person
Elect
End-Position
Erdogan
resigned
won
person
Abdullah Gul
PER
1. An
Elect
event usually has only one
Person
argument
2. An entity is unlikely to act as a
Person
argument for
End-Position
and
Elect
events at the same time
Slide3A Joint Neural Model for Information Extraction
We propose a single joint neural model for information extraction that extracts the information graph from a given sentence in four steps: encoding, identification, classification, and decoding.3
Slide4A Joint Neural Model for Information Extraction
Encoding: We use a BERT encoder to obtain the contextualized embedding of each token4
Slide5A Joint Neural Model for Information Extraction
Identification: We use CRF taggers to identify entity mentions and event triggersWe define the identification loss as 5
Slide6A Joint Neural Model for Information Extraction
Classification: We use feed-forward networks to calculate label scores for each node or edgeWe define the classification loss as 6
Slide7A Joint Neural Model for Information Extraction
Decoding: In the test phase, we use a beam search decoder to find the information graph with the highest global score7
Slide8Incorporating Global Features
We design a set of global feature templates (e.g., event_type1 – role1 – role2 – event_type2 : an entity acts a role1 argument for an event_type1 event and a role2 argument for an event_type2 event in the same sentence)The model learns the weight of each feature during training8PER
target
Life:Die
Conflict:Attack
man
shot
dies
victim
PER
attacker
Life:Die
Conflict:Attack
man
shot
dies
victim
Life:Die
– victim – target –
Conflict:Attack
Life:Die
– victim – attacker –
Conflict:Attack
Positive weight
Negative weight
Slide9Incorporating Global Features
Given a graph , we generate its global feature vector as , where is a function that evaluates a certain feature and returns a scalar. For example,
Next, we learn a weight vector
and calculate the global feature score of
as the dot production of
and
.
Global score
of a graph: local graph score + global feature score:
We assume that the gold-standard graph for a sentence should achieve the highest global score and minimize the following loss function:
9
Slide10Global Feature Templates
10CategoryDescription
Role
1. The number of entities that act as <
role
i
> and <
role
j
> arguments at the same time
2. The number of <
event_type
i
> events with <number> <
rolej> arguments
3. The number of occurrences of <entity_type
i> and <rolej> combinations
4. The number of events that have multiple <role
i> arguments
5. The number of entities that act as a <rolei> argument of an <event_typej
> event and a <rolek> argument of an <event_typel
> event at the same time
Relation
6. The number of occurrences of <entity_typei>, <
entity_typej>, and <relation_typek> combination
7. The number of occurrences of <entity_type
i> and <relation_typej> combinations
8. The number of occurrences of a <relation_type
i> relation between a <rolej> argument and a <rolek
> argument of the same event
9. The number of entities that have a <relation_typei
> relation with multiple entities.
10. The number of entities involving in a <relation_type
i> and <relation_typej> relations simultaneously
Trigger
11. Whether a graph contains more than one <event_type
i> event
Slide11Decoding
We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.11
E1
1
E1
2
Candidate 1 of node E1
Candidate 2 of node E1
Node Step
add node 1
Campbell:
FAC, ORG
Slide12Decoding
We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.12
E1
1
E1
2
Candidate 1 of node E1
Candidate 2 of node E1
Node Step
Node Step
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
1
E2
2
add node 1
add node 2
Campbell:
FAC, ORG
fine:
Fine, Sue
Slide13Decoding
We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.13
E1
1
E1
2
Candidate 1 of node E1
Candidate 2 of node E1
Node Step
Node Step
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
1
E2
2
Edge Step
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
1
E2
2
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
1
E2
2
R1
1
R1
1
R1
1
R1
1
R1
2
R1
2
R1
2
R1
2
add node 1
add node 2
add the edge between node 1 and 2
Campbell:
FAC, ORG
fine:
Fine, Sue
null, entity, person, …
Slide14Decoding
We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.14
E1
1
E1
2
Candidate 1 of node E1
Candidate 2 of node E1
Node Step
Node Step
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
1
E2
2
Edge Step
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
1
E2
2
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
1
E2
2
R1
1
R1
1
R1
1
R1
1
R1
2
R1
2
R1
2
R1
2
add node 1
add node 2
add the edge between node 1 and 2
Campbell:
FAC, ORG
fine:
Fine, Sue
null, entity, person, …
Fine
FAC
Campbell
fines
entity
Fine
PER
Campbell
fines
entity
Sort
Higher local score
Higher global score
Prune
E1
2
E1
1
E1
1
E1
2
E2
1
E2
2
E2
1
E2
2
E1
1
E1
1
E1
2
E1
2
E2
1
E2
2
E2
2
E2
1
R1
1
R1
2
R1
1
R1
1
R1
2
R1
1
R1
2
R1
2
Slide15Experiment: Overall Performance
15DatasetTask
DyGIE
++
DyGIE
++*
Baseline
Our
Our*
ACE05-R
Entity
88.6
-
-
88.8
-
Relation
63.4
-
-
67.5
-
ACE05-E
Entity
89.790.7
90.2
90.290.3
Trigger Identification
-76.5
76.678.2
78.6
Trigger Classification
69.7
73.6
73.5
74.775.2
Argument Identification
53.055.4
56.4
59.260.7
Argument Classification
48.852.5
53.9
56.8
58.6
Our model outperforms the state-of-the-art model on most subtasks.
DyGIE
++* and Our* use a four-model ensemble optimized for trigger detection.
Slide16Experiment: Overall Performance
16DatasetEntity
Trigger
Identification
Trigger
Classification
Argument
Identification
Argument
Classification
Relation
ACE05-E
+
89.6
75.6
72.8
57.3
54.8
58.6
ERE
87.0
68.457.0
50.1
46.5
53.2
We establish new benchmark results as follows.
We add back
Entity: pronouns
Relation: the order of argumentsEvent: multi-token event triggers
Slide17Experiment: Porting to Other Languages
17DatasetTraining
Entity
Trigger
Classification
Argument
Classification
Relation
ACE05-CN
ACE05-CN
88.5
65.6
52.0
62.4
ACE05-E
+
+ ACE05-CN
89.8
67.7
53.2
62.9
ERE-ES
ERE-ES
81.3
56.840.3
48.1
ERE-EN + ERE-ES81.8
59.1
46.552.9
We derive a Chinese dataset from ACE05 and a Spanish dataset from ERE.
OneIE
works well on Chinese and Spanish data without any special design for the new languages.
Adding
English training data can improve the performance on Chinese and Spanish.
Slide18Experiment: Salient Global Features
18
Our global features are explainable
Features
Weight
1
A
Transport
event has only one
Destination
argument
2.61
2
An
Attack
event has only one
Place
argument
2.31
3
A
PER-SOC
relation exists between two
PER
entities
1.51
4
A Beneficiary
argument is a PER entity
0.93
5
An entity has
ORG-AFF relations with multiple entities
-3.21
6
An event has two
Place arguments
-2.477
A
Transport event has multiple Destination arguments
-2.25
8
An entity has a GEN-AFF
relation with multiple entities
-2.02
Slide1919
Our code and models are
available at:
http://
blender.cs.illinois.edu
/software/
oneie
/
Thank You!