An Enhanced Adaboost PatternMatching System Yan Li Beijing University of Posts and Telecommunications buptliyangmailcom Outline Introduction Preprocessing Entity Expansion Pattern bootstrapping ID: 733794
Download Presentation The PPT/PDF document "PRIS at Slot Filling in KBP 2012:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System
Yan Li
Beijing University of Posts and Telecommunications
buptliyan@gmail.comSlide2
OutlineIntroductionPreprocessingEntity Expansion
Pattern bootstrapping
Post-processing
Evaluation results
ConclusionSlide3
Introduction: the frameworkSlide4
PreprocessingNLP (the Standford
CoreNLP
toolkit)
POS tagger
NER
Date and time expression recognitionDependency parserCoreference resolutionSlide5
Preprocessing (cont’)Example:
Takeshi Watanabe, the first president of the ADB, died in his native Japan.
The categorizations of slotsSlide6
PER
ORG
Domain
Slots
Domain
Slots
PER
alternate_names
; spouses; children; parents; siblings;
other_family
PER
alternate_names
; members; shareholders;
founded_by
;
top_members
/
emplyees
ORG
member_of
;
employee_of
ORG
parents; members;
member_of
; shareholders; subsidiaries
LOC
country/state/
city_of_birth
/death/residence
DATE
date_of_birth
/death
LOC
member_of
; country/state/
city_of_headquarters
;
NUM
age
ORI
origin
REL
religion
DATE
founded; dissolved
SCHOOL
schools_attended
NUM
number_of_employees
/members
CAUSE
cause_of_death
TITLE
titles
URL
website
CHARGE
charges
REL
political/
religious_affiliationSlide7Slide8
Entity ExpansionThe coreferences
and alternate names of an entity exist in relevant documents.
In the purpose of improving recall.
Scheme 1 (PER & ORG):
coreference
resolutionThe relation chain run by the Stanford CoreNLP.Example:Slide9
Entity Expansion (cont’)Scheme 2 (PER & ORG): identifying alternate namesRule-based information extraction
Interpretative entities in parenthesis
Example:
Starr International Co.
, known as
SICO, ……Scheme 3 (ORG)Removing the corporate suffixes in queriesFinding the acronyms or full expressions
Example:Norwegian University of Science and Technology (NTNU)Slide10Slide11
Pattern Bootstrapping: Workflow
Ralph
Grishman
and
Bonan
Min, “New York University KBP 2010 Slot‐Filling System
”, 2010.Slide12
Pattern Bootstrapping: Seed Pairs
The KBP English Monolingual Slot Filling Evaluation Data in the past three years
92 PER entities
106 ORG entities
1,627 entity-value pairsSlide13
Word sequence patternthe middle context between an entity-value pairExample:
PER:countries_of_residence
<PER> native <LOC>
Dependency path pattern
the shortest dependency path which connects an entity-value pair
Example: PER:title <PER> appos
<TITLE>
PER:member_of <PER> appos president prep_of
<ORG>
PER:country_of_death
<PER> nsubj-1 died
prep_in
<LOC>
Pattern Bootstrapping:
Pattern GenerationSlide14
Pattern Bootstrapping: Pattern Evaluation
In the purpose of improving precision
Pattern frequency
Trigger phrase
High-confidence patterns
New entity-value pairsIterationSlide15Slide16
Post-processingIn the purpose of improving precisionDATE
The
SUTime
module of the
CoreNLP
TIMEX2 normalizationPER: spouses, children and parentsLast name complementExample: John Doe’s first wife, Ruth
“Ruth Doe” is better than
“Ruth”. Slide17
Post-processing (cont’)Identifying countries,
states/provinces
and
cities
for LOC slots
A Wikipedia list containing all countries and states or provinces.Adding modifiers into fillers of per: titleadjectival modifier: financial Ministernoun compound modifier:
police chiefprepositional modifier: chief of military operationsSlide18
Evaluation ResultsPRIS
Summary Statistics
LDC
Top-1
Top-2
Median
Precision
0.9278607
0.6757322
0.48955223
0.11392405
Recall
0.7252106
0.41866493
0.21257292
0.0874919
F1
0.8141142
0.5170068
0.2964302
0.0989736Slide19
Slot
non-NIL correct
redundant
inexact
wrong
missing
Alternate names
6
0
0
0
23
Date of birth
16
4
0
1
1
Date of death
17
1
0
4
2
age
22
0
0
2
2
Country of birth
1
0
0
0
1
State
or province of birth
8
0
2
3
2
City of birth
13
1
0
5
2
Country of death
1
0
0
2
0
State or province of death
13
0
2
1
2
City of death
17
0
0
4
1
Country of residence
10
2
2
7
3
State or province of
residence
22
1
4
5
13
City of residence
35
1
0
14
8
origin
16
2
0
17
0
Cause of death
18
0
0
1
13
Schools attended
19
7
0
1
14
titles
85
13
8
24
4
Member of
26
2
4
17
10
Employee of
7
0
2
5
20
religion
4
0
0
1
3
spouses
16
5
1
3
10
Children
73
0
3
10
6
Parents
21
4
0
1
4
Siblings
20
0
1
8
3
Other family
2
0
0
0
7
Charges
5
0
0
4
2Slide20
Slot
non-NIL correct
redundant
inexact
wrong
missing
Alternate names
46
4
5
25
5
Political/religious affiliations
7
1
0
6
3
Top members/employees
59
1
2
20
8
Number
of
employees/members
3
0
0
0
8
Members
0
0
0
0
4
Member of
0
0
0
0
7
Subsidiaries
7
0
0
3
10
Parents
4
1
0
4
4
Founded by
5
0
0
3
5
Founded
5
0
0
1
3
Dissolved
1
0
0
0
2
Country of headquarters
3
0
0
1
20
State or province of headquarters
1
1
0
7
11
City
of headquarters
2
0
0
3
10
Shareholders
3
0
1
8
0
Website
7
0
0
1
8Slide21
ConclusionIn the slot filling task of KBP 2012, we designed an enhanced pattern-matching system
which consists of
preprocessing
,
entity expansion
, pattern bootstrapping and post-processing.The precision and recall are relatively good for some specific slots.
It is urgent to improve the remaining slots.Slide22
TipsAdequate preparationA harmonious teamActive and disciplined environment
Be passionate, patient and hardworking
……Slide23
Thank you!