Mihai Surdeanu David McClosky John Bauer Julie Tibshirani Angel Chang Valentin Spitkovsky Christopher Manning Definition and Approach We took part in TAC KBP 2010 this year both tasks ID: 269340
Download Presentation The PPT/PDF document "Distant Supervision for Knowledge Base P..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Distant Supervision for Knowledge Base Population
Mihai Surdeanu, David
McClosky
, John Bauer, Julie
Tibshirani
, Angel Chang,
Valentin
Spitkovsky, Christopher ManningSlide2
Definition and Approach
We took part in TAC KBP 2010 this year (both tasks)
Slot filling task: learning a pre-defined set of relations and attributes for target entities based on documents in a collection
“Warren Buffett began studying at the
Warton
School of Finance at the University of Pennsylvania, but transferred to the University of Nebraska where he graduated.”
(
per:schools_attended
, Warren Buffett, University of Pennsylvania)
(
per:schools_attended
, Warren Buffett, University of Nebraska
Distant supervision approach: generate training data automatically from Wikipedia
infoboxesSlide3
Infobox
KB
Map
infobox
fields to KBP slots
(one to many mapping)
IR: find relevant sentences
Query: entity name + slot value
Extract +/- slot candidates
Train multiclass classifier
Map KBP
slots to
fine-grained
NE labels
KBP query: entity name
IR: find relevant sentences
Query: entity name + trigger words
Extract slot candidates
Classify candidates
Inference (greedy, local)
Training
Evaluation
Extracted
slotsSlide4
Results
Label
Correct
Predict
Actual
P
RF1UNRELATED268085
28913529559092.790.791.7org:city_of
_headquarters583590407514
64.577.770.5org:country_of_headquarters
28514638372561.576.568.2
org:founded38968199666247.5
58.552.4org:parents11582292
252550.545.948.1org:top_members/employees1282
3067359641.835.738.5
per:city_of_birth17993920325245.9
55.350.2per:country_of_birth19844122
320448.1
61.954.2per:date_of_birth
39385427
4362
72.690.3
80.5per:member_of
17713018
288758.7
61.3
60per:title
1714
336430545156.153.4
Total37169688226236754
59.656.7Training on 2/3 of infoboxes, evaluatingon 1/3Evaluating only on
sentences that containat least a valid slot
Top 10most commonslotsTotal for
all slotsSlide5
Challenges
Improve quality of data generated through distant supervision
Improve IR recall
Use relation-specific trigger words (or
n
-grams or dependency paths etc.) to boost sentences likely to contain answers to the topHow to acquire these automatically?Better classifiers for noisy text (e.g., web snippets)