NYU Cold Start System Ang Sun Xin Wang Sen Xu Yigit Kiran Shakthi Poornima Andrew Borthwick Intelius Inc Ralph Grishman New York University Outline ID: 245757
Download Presentation The PPT/PDF document "Intelius" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Intelius-NYU Cold Start System
Ang Sun, Xin Wang, Sen Xu, Yigit Kiran, Shakthi Poornima, Andrew Borthwick(Intelius Inc.)Ralph Grishman (New York University)Slide2
OutlineCold Start Slot Filling SystemEntity Linking for Person and
OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide3
OutlineCold Start Slot Filling System
Entity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide4
Cold Start Slot Filling SystemThe NYU 2011 Regular Slot Filling SystemSlide5
Cold Start Slot Filling SystemAdapt the NYU system to Cold Start
Within document coreference extract entities for a single documentextract the longest name mention as the canonical mention canonical mention: Maurice Sercarzmention: SercarzSlot filling for GPEsinfer slot fills from the extractions of person and organization entitiesSlide6
Cold Start Slot Filling SystemAdapt the NYU system to Cold Start
Contextual information extractionSlide7
OutlineCold Start Slot Filling
SystemEntity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide8
Intelius Entity Linking Pipeline
Blocking
Top Level Blocking
Sub-blocking
Clustering
Transitive Closure
Graph Partition
Machine Learning based Link Scoring
Coalesce
Records
Person Profiles
Goal:
Conflate billions of entities
Map Reduce Based
Sequential file access
Optimized for batch processing billions of records sequentially
Optimization and compromises crucial to successSlide9
BlockingBring together records likely to belong to the same entity
Blocking KeysHash functionsHand crafted and domain specificEquivalent classes of names and titlesContextual PER, ORG and GPE Keywords (TFIDF)Dynamically selectedSlide10
Link Scoring
ADTree-based supervised model Training examples:Sample selection: randomly and selectively (through active learning)Labeling process:Three phases:Amazon Mechanical Turk LabelingInternal Data Rater InspectionResearchers Multi-round of relabeling and inspection are needed if the quality of labels from Turkers is lowSize:50,000 pairs for PER and 4,000 pairs for ORGSlide11
Features
PER Feature Types (116 features):General Demographic:Name frequencyBirthdayLocationPopulationCombinationsComparing KBP specific slots:JobsEducationsTFIDF and N-gram: for contextual text information
ORG Feature Types (60 features):
Location based
Comparing KBP specific slots
TFIDF and N-gram
for contextual text informationSlide12
ORG ADTree Model (Partial)Slide13
OutlineCold Start Slot Filling
SystemEntity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide14
GPE DisambiguationGPE (
Toponyms) can be ambiguousChina: Country or Town in Maine, USGeorgia: Country or State in the USSpringfield: exists in more than 10 US StatesBerlin: Capital of Germany, State in Germany, also common city name in the USOver 5,000 ambiguous toponyms from geonames.orgUse contextual GPE to disambiguateCandidates with least cumulative spatial distance (
Buscaldi and Rosso, 2008)Voting schema with a hierarchical gazetteerSlide15
Hierarchical Gazetteer
CountryState/ProvinceCity/Town
Gazetteer Sample
Key
Value
China
Country_POP_1,330,044,000;
City_InState_Maine_InCountry_US
Seattle
City_InState_Washington_InCountry_US
Georgia
Country_POP_4,630,000;
State_POP_8,975,842_InCountry_US
…
…Slide16
Voting Schema
Topo
j
’s
Vote for
Candidate
Topo
i
+
3
:
if
Topo
i
and
Topo
j
are sibling cities
e.g.:
Austin, TX
and
Houston, TX
+5: if
Topoi
and Topoj are sibling States e.g.: Georgia
and Alabama+10:
if Topoi is offspring of Topoj
e.g.:
Austin, TX and Texas+5: if Topoi is parent of Topoj e.g.: Washington and Seattle, WASlide17
OutlineCold Start Slot Filling
SystemEntity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide18
671 million
Intelius People
Profiles
74+ million
Topix
News/blog articles
167+ million
People
Entities
26.5 million
Conflated
Blocking
Top Level Blocking
Sub-blocking
Clustering
Transitive Closure
Graph Partition
Machine Learning based Link Scoring
Coalesce
Records
Link News Profiles to
Intelius
Profiles
Turker/Data Rater Evaluate: 8.06% were incorrectly conflated
Blocking
Top Level Blocking
Sub-blocking
Clustering
Transitive Closure
Graph Partition
Machine Learning based Link Scoring
Coalesce
Records
Person ProfilesSlide19
Thanks!Slide20
?