/
Intelius Intelius

Intelius - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
365 views
Uploaded On 2016-03-07

Intelius - PPT Presentation

NYU Cold Start System Ang Sun Xin Wang Sen Xu Yigit Kiran Shakthi Poornima Andrew Borthwick Intelius Inc Ralph Grishman New York University Outline ID: 245757

system entity linking cold entity system cold linking blocking slot filling start topo gpe person organization based link geo

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Intelius" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Intelius-NYU Cold Start System

Ang Sun, Xin Wang, Sen Xu, Yigit Kiran, Shakthi Poornima, Andrew Borthwick(Intelius Inc.)Ralph Grishman (New York University)Slide2

OutlineCold Start Slot Filling SystemEntity Linking for Person and

OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide3

OutlineCold Start Slot Filling System

Entity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide4

Cold Start Slot Filling SystemThe NYU 2011 Regular Slot Filling SystemSlide5

Cold Start Slot Filling SystemAdapt the NYU system to Cold Start

Within document coreference extract entities for a single documentextract the longest name mention as the canonical mention canonical mention: Maurice Sercarzmention: SercarzSlot filling for GPEsinfer slot fills from the extractions of person and organization entitiesSlide6

Cold Start Slot Filling SystemAdapt the NYU system to Cold Start

Contextual information extractionSlide7

OutlineCold Start Slot Filling

SystemEntity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide8

Intelius Entity Linking Pipeline

Blocking

Top Level Blocking

Sub-blocking

Clustering

Transitive Closure

Graph Partition

Machine Learning based Link Scoring

Coalesce

Records

Person Profiles

Goal:

Conflate billions of entities

Map Reduce Based

Sequential file access

Optimized for batch processing billions of records sequentially

Optimization and compromises crucial to successSlide9

BlockingBring together records likely to belong to the same entity

Blocking KeysHash functionsHand crafted and domain specificEquivalent classes of names and titlesContextual PER, ORG and GPE Keywords (TFIDF)Dynamically selectedSlide10

Link Scoring

ADTree-based supervised model Training examples:Sample selection: randomly and selectively (through active learning)Labeling process:Three phases:Amazon Mechanical Turk LabelingInternal Data Rater InspectionResearchers Multi-round of relabeling and inspection are needed if the quality of labels from Turkers is lowSize:50,000 pairs for PER and 4,000 pairs for ORGSlide11

Features

PER Feature Types (116 features):General Demographic:Name frequencyBirthdayLocationPopulationCombinationsComparing KBP specific slots:JobsEducationsTFIDF and N-gram: for contextual text information

ORG Feature Types (60 features):

Location based

Comparing KBP specific slots

TFIDF and N-gram

for contextual text informationSlide12

ORG ADTree Model (Partial)Slide13

OutlineCold Start Slot Filling

SystemEntity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide14

GPE DisambiguationGPE (

Toponyms) can be ambiguousChina: Country or Town in Maine, USGeorgia: Country or State in the USSpringfield: exists in more than 10 US StatesBerlin: Capital of Germany, State in Germany, also common city name in the USOver 5,000 ambiguous toponyms from geonames.orgUse contextual GPE to disambiguateCandidates with least cumulative spatial distance (

Buscaldi and Rosso, 2008)Voting schema with a hierarchical gazetteerSlide15

Hierarchical Gazetteer

CountryState/ProvinceCity/Town

Gazetteer Sample

Key

Value

China

Country_POP_1,330,044,000;

City_InState_Maine_InCountry_US

Seattle

City_InState_Washington_InCountry_US

Georgia

Country_POP_4,630,000;

State_POP_8,975,842_InCountry_US

…Slide16

Voting Schema

 

Topo

j

’s

Vote for

Candidate

Topo

i

+

3

:

if

Topo

i

and

Topo

j

are sibling cities

e.g.:

Austin, TX

and

Houston, TX

+5: if

Topoi

and Topoj are sibling States e.g.: Georgia

and Alabama+10:

if Topoi is offspring of Topoj

e.g.:

Austin, TX and Texas+5: if Topoi is parent of Topoj e.g.: Washington and Seattle, WASlide17

OutlineCold Start Slot Filling

SystemEntity Linking for Person and OrganizationEntity Linking for Geo-Political Entity (GPE)ExperimentsSlide18

 

 

 

 

 

671 million

Intelius People

Profiles

 

74+ million

Topix

News/blog articles

 

167+ million

People

Entities

 

26.5 million

Conflated

 

 

Blocking

Top Level Blocking

Sub-blocking

 

 

 

Clustering

Transitive Closure

Graph Partition

 

 

Machine Learning based Link Scoring

 

Coalesce

 

 

Records

 

 

 

 

 

 

 

 

Link News Profiles to

Intelius

Profiles

Turker/Data Rater Evaluate: 8.06% were incorrectly conflated

Blocking

Top Level Blocking

Sub-blocking

Clustering

Transitive Closure

Graph Partition

Machine Learning based Link Scoring

Coalesce

Records

Person ProfilesSlide19

Thanks!Slide20

?

Related Contents


Next Show more