/
Joint Neural Model for Information Extraction with Global Features Joint Neural Model for Information Extraction with Global Features

Joint Neural Model for Information Extraction with Global Features - PowerPoint Presentation

genderadidas
genderadidas . @genderadidas
Follow
357 views
Uploaded On 2020-11-06

Joint Neural Model for Information Extraction with Global Features - PPT Presentation

Ying Lin 1 Heng Ji 1 Fei Huang 2 Lingfei Wu 3 1 University of Illinois at UrbanaChampaign 2 Alibaba DAMO Academy 3 IBM Research Motivation Pipelined models suffer from the error propagation problem and disallow interactions among components ID: 816256

event node argument entity node event entity argument relation global information model number campbell add person classification step entities

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Joint Neural Model for Information Extra..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Joint Neural Model for Information Extraction with Global Features

Ying Lin1, Heng Ji1, Fei Huang2, Lingfei Wu31University of Illinois at Urbana-Champaign2Alibaba DAMO Academy 3IBM Research

Slide2

Motivation

Pipelined models suffer from the error propagation problem and disallow interactions among components.Existing neural models do not explicitly model cross-subtask and cross-instance interactions among knowledge elements.Example: Prime Minister Abdullah Gul resigned earlier Tuesday to make way for Erdogan, who won a parliamentary seat in by-elections Sunday.2PER

person

Elect

End-Position

Erdogan

resigned

won

person

Abdullah Gul

PER

person

PER

person

Elect

End-Position

Erdogan

resigned

won

person

Abdullah Gul

PER

1. An

Elect

event usually has only one

Person

argument

2. An entity is unlikely to act as a

Person

argument for

End-Position

and

Elect

events at the same time

Slide3

A Joint Neural Model for Information Extraction

We propose a single joint neural model for information extraction that extracts the information graph from a given sentence in four steps: encoding, identification, classification, and decoding.3

Slide4

A Joint Neural Model for Information Extraction

Encoding: We use a BERT encoder to obtain the contextualized embedding of each token4

Slide5

A Joint Neural Model for Information Extraction

Identification: We use CRF taggers to identify entity mentions and event triggersWe define the identification loss as 5

Slide6

A Joint Neural Model for Information Extraction

Classification: We use feed-forward networks to calculate label scores for each node or edgeWe define the classification loss as 6

Slide7

A Joint Neural Model for Information Extraction

Decoding: In the test phase, we use a beam search decoder to find the information graph with the highest global score7

Slide8

Incorporating Global Features

We design a set of global feature templates (e.g., event_type1 – role1 – role2 – event_type2 : an entity acts a role1 argument for an event_type1 event and a role2 argument for an event_type2 event in the same sentence)The model learns the weight of each feature during training8PER

target

Life:Die

Conflict:Attack

man

shot

dies

victim

PER

attacker

Life:Die

Conflict:Attack

man

shot

dies

victim

Life:Die

– victim – target –

Conflict:Attack

Life:Die

– victim – attacker –

Conflict:Attack

Positive weight

Negative weight

Slide9

Incorporating Global Features

Given a graph , we generate its global feature vector as , where is a function that evaluates a certain feature and returns a scalar. For example,

Next, we learn a weight vector

and calculate the global feature score of

as the dot production of

and

.

Global score

of a graph: local graph score + global feature score:

We assume that the gold-standard graph for a sentence should achieve the highest global score and minimize the following loss function:

 

9

Slide10

Global Feature Templates

10CategoryDescription

Role

1. The number of entities that act as <

role

i

> and <

role

j

> arguments at the same time

2. The number of <

event_type

i

> events with <number> <

rolej> arguments

3. The number of occurrences of <entity_type

i> and <rolej> combinations

4. The number of events that have multiple <role

i> arguments

5. The number of entities that act as a <rolei> argument of an <event_typej

> event and a <rolek> argument of an <event_typel

> event at the same time

Relation

6. The number of occurrences of <entity_typei>, <

entity_typej>, and <relation_typek> combination

7. The number of occurrences of <entity_type

i> and <relation_typej> combinations

8. The number of occurrences of a <relation_type

i> relation between a <rolej> argument and a <rolek

> argument of the same event

9. The number of entities that have a <relation_typei

> relation with multiple entities.

10. The number of entities involving in a <relation_type

i> and <relation_typej> relations simultaneously

Trigger

11. Whether a graph contains more than one <event_type

i> event

Slide11

Decoding

We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.11

E1

1

E1

2

Candidate 1 of node E1

Candidate 2 of node E1

Node Step

add node 1

Campbell:

FAC, ORG

Slide12

Decoding

We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.12

E1

1

E1

2

Candidate 1 of node E1

Candidate 2 of node E1

Node Step

Node Step

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

1

E2

2

add node 1

add node 2

Campbell:

FAC, ORG

fine:

Fine, Sue

Slide13

Decoding

We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.13

E1

1

E1

2

Candidate 1 of node E1

Candidate 2 of node E1

Node Step

Node Step

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

1

E2

2

Edge Step

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

1

E2

2

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

1

E2

2

R1

1

R1

1

R1

1

R1

1

R1

2

R1

2

R1

2

R1

2

add node 1

add node 2

add the edge between node 1 and 2

Campbell:

FAC, ORG

fine:

Fine, Sue

null, entity, person, …

Slide14

Decoding

We use beam search to decode the information graphExample: He also brought a check from Campbell to pay the fines and fees.14

E1

1

E1

2

Candidate 1 of node E1

Candidate 2 of node E1

Node Step

Node Step

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

1

E2

2

Edge Step

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

1

E2

2

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

1

E2

2

R1

1

R1

1

R1

1

R1

1

R1

2

R1

2

R1

2

R1

2

add node 1

add node 2

add the edge between node 1 and 2

Campbell:

FAC, ORG

fine:

Fine, Sue

null, entity, person, …

Fine

FAC

Campbell

fines

entity

Fine

PER

Campbell

fines

entity

Sort

Higher local score

Higher global score

Prune

E1

2

E1

1

E1

1

E1

2

E2

1

E2

2

E2

1

E2

2

E1

1

E1

1

E1

2

E1

2

E2

1

E2

2

E2

2

E2

1

R1

1

R1

2

R1

1

R1

1

R1

2

R1

1

R1

2

R1

2

Slide15

Experiment: Overall Performance

15DatasetTask

DyGIE

++

DyGIE

++*

Baseline

Our

Our*

ACE05-R

Entity

88.6

-

-

88.8

-

Relation

63.4

-

-

67.5

-

ACE05-E

Entity

89.790.7

90.2

90.290.3

Trigger Identification

-76.5

76.678.2

78.6

Trigger Classification

69.7

73.6

73.5

74.775.2

Argument Identification

53.055.4

56.4

59.260.7

Argument Classification

48.852.5

53.9

56.8

58.6

Our model outperforms the state-of-the-art model on most subtasks.

DyGIE

++* and Our* use a four-model ensemble optimized for trigger detection.

Slide16

Experiment: Overall Performance

16DatasetEntity

Trigger

Identification

Trigger

Classification

Argument

Identification

Argument

Classification

Relation

ACE05-E

+

89.6

75.6

72.8

57.3

54.8

58.6

ERE

87.0

68.457.0

50.1

46.5

53.2

We establish new benchmark results as follows.

We add back

Entity: pronouns

Relation: the order of argumentsEvent: multi-token event triggers

Slide17

Experiment: Porting to Other Languages

17DatasetTraining

Entity

Trigger

Classification

Argument

Classification

Relation

ACE05-CN

ACE05-CN

88.5

65.6

52.0

62.4

ACE05-E

+

+ ACE05-CN

89.8

67.7

53.2

62.9

ERE-ES

ERE-ES

81.3

56.840.3

48.1

ERE-EN + ERE-ES81.8

59.1

46.552.9

We derive a Chinese dataset from ACE05 and a Spanish dataset from ERE.

OneIE

works well on Chinese and Spanish data without any special design for the new languages.

Adding

English training data can improve the performance on Chinese and Spanish.

Slide18

Experiment: Salient Global Features

18

Our global features are explainable

Features

Weight

1

A

Transport

event has only one

Destination

argument

2.61

2

An

Attack

event has only one

Place

argument

2.31

3

A

PER-SOC

relation exists between two

PER

entities

1.51

4

A Beneficiary

argument is a PER entity

0.93

5

An entity has

ORG-AFF relations with multiple entities

-3.21

6

An event has two

Place arguments

-2.477

A

Transport event has multiple Destination arguments

-2.25

8

An entity has a GEN-AFF

relation with multiple entities

-2.02

Slide19

19

Our code and models are

available at:

http://

blender.cs.illinois.edu

/software/

oneie

/

Thank You!