/
Knowledge Graph and Corpus Driven Segmentation and Knowledge Graph and Corpus Driven Segmentation and

Knowledge Graph and Corpus Driven Segmentation and - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
350 views
Uploaded On 2018-09-24

Knowledge Graph and Corpus Driven Segmentation and - PPT Presentation

Answer Inference for Telegraphic Entityseeking Queries EMNLP 2014 Mandar Joshi Uma Sawant Soumen Chakrabarti IBM Research IIT Bombay Yahoo Labs IIT Bombay mandarj90inibmcom umacseiitbacin ID: 678581

type relation corpus entity relation type entity corpus interpretation dave navarro query model segmentation language sempre selector band wqt

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Knowledge Graph and Corpus Driven Segmen..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Knowledge Graph and Corpus Driven Segmentation andAnswer Inference for Telegraphic Entity-seeking QueriesEMNLP 2014

Mandar Joshi

Uma Sawant

Soumen Chakrabarti

IBM Research

IIT Bombay, Yahoo Labs

IIT Bombay

mandarj90@in.ibm.com

uma@cse.iitb.ac.in

soumen@cse.iitb.ac.inSlide2

Entity-seeking Telegraphic QueriesShortUnstructuredExpect answers as entitiesSlide3

No reliable syntax cluesFree word orderNo or rare capitalizationRare to find quoted phrasesAmbiguousMultiple interpretationsaamir khan filmsAamir Khan - the

Indian actor

or

British

boxer

Films - appeared in, directed by, or aboutDifficult to exploit redundancy

ChallengesSlide4

KG are high precision but incompleteWork in progressTriples can’t represent all informationStructured – unstructured gapCorpus provides recallfastest odi century batsman

Why do we need the corpus?

… Anderson hits 

fastest ODI century

 in mismatch ... was the first time two 

batsmen

 have hit hundreds in under 50 balls in the same 

ODI

.Slide5

Annotated Web with Knowledge Graph… 1998 Redford directed movie The Horse Whisperer, based on the 1995 novel…

Entity

:

The_Horse_Whisperer

Type:

/film/film

instanceOf

mentionOf

Annotated document

Entity:

Robert_Redford

/film/director/film

Type:

/film/director

instanceOf

mentionOfSlide6

Interpretation via segmentationSlide7

Queries seek answer entities (e2)Contain (query) entities (e1) , target types (t2), relations (r), and selectors (s).

Signals from the Query

query

e

1

r

t

2

s

dave navarro first band

dave navarro

band

band

first

dave navarro

-

band

first

spider automobile company

spider

automobile company

automobile company

-

automobile

company

company

spiderSlide8

Interpretation = Segmentation + AnnotationSegmentation of query tokens into 3 partitionsQuery entity (E1)Relation and Type (T2/R)Selectors (S)Each partition may map to multiple annotationsMultiple interpretations possibleWe keep around all of them

Segmentation and InterpretationSlide9

dave navarro first band

dave

navarro

first band

Segmentation and Interpretation

r: member Of t2: musical group

r: null t2: musical group

1.

Dave_Navarro

(Artist)

2.

Dave_Navarro

(Episode)

1.

Dave_Navarro

(Artist)

2.

Dave_Navarro

(Episode)

E

1

partition

T

2

/R partitionSlide10

Model 1 : Graphical ModelTargettype

Connecting

relation

Query

entity

Selectors

Segmentation

Candidate

entity

Type

language

model

Relation

language

model

Entity

language

model

KG-assisted relation

evidence potential

Entity Type Compatibility

Corpus-assisted

entity-relation

evidence potential

Provides mentions of entity, relation, and typeSlide11

Objective: To map relation mentions to Freebase relation(s)Use annotated ClueWeb09 + Freebase triplesFor triples of each Freebase relationLocate endpoints in corpusExtract dependency path phrase between entitiesMaintain co-occurrence counts of <phrase, rel>Score using LMRelation Language

ModelSlide12

Smoothed Dirichlet language modelCreate micro-document for each type usingDescription link /common/topic/aliasWords in type nameIncoming relation links Micro-document for /location/citytown

City, location, headquarters,

capital_city

etc

Favors specific types over generic ones

“city” should map to

/location/

citytown

and not

/location/location

Type Language ModelSlide13

Estimates support to interpretation in corpusSnippets scored using RankSVM Partial list of featuresNumSnippets with distance(e2 , e1 ) < k (k = 5, 10)NumSnippets with distance(e2 , r) < k (k = 3, 6)NumSnippets with relation r = ⊥NumSnippets with relation phrases as prepositions

NumSnippets

covering fraction of query IDF > k (k = 0.2, 0.4, 0.6, 0.8)

Snippet PotentialSlide14

Candidate answer entities (e2)Top k from corpus based on snippet frequencyBy KG links that are in interpretations setEach candidate e2 is scored asInferenceSlide15

Model 2 : Latent Variable Discriminative Trainingq, e2 are observede1, t2, r and z are latent

Non-convex formulation

Constraints

are formulated using the best scoring interpretationSlide16

ExperimentsSlide17

Test bedFreebase entity, type and relation knowledge graph~29 million entities14K types,

2K selected relation types

Annotated corpus

Clueweb09B Web corpus having 50 million pages

Google annotations (FACC1), ~ 13 annotations per page

Text and Entity IndexSlide18

Test BedQuery setsTREC - INEX : 700 entity search queriesWQT : Subset of ~800 queries from WebQuestions natural language query set, converted to telegraphic form

TREC-INEX

WQT

Has type and/or relation hints

Has mostly relation hints

Answers from KG and Corpus collected

by

Tas

Answers from KG only collected by

turkers

.

More suitable for corpus (+ KG)

More suitable for KGSlide19

Synergy between KG and CorpusCorpus and knowledge graph help each other to deliver better performanceSlide20

Query template comparisonEntity-relation-type-selector template provides yields better accuracy than type-selector template

Dataset

Formulation

MAP

MRR

n@10

TREC-INEX

No interpretation

.205

.215

.292

Type + Selector

.292

.306

.356

Unoptimized

.409

.419

.502

LVDT

.419

.436

.541

WQT

No interpretation

.080

.095

.131

Type + Selector

.116

.152

.201

Unoptimized

.377

.401

.474

LVDT

.295

.323

.406Slide21

Comparison with semantic parsersDataset

Formulation

MAP

MRR

n@10

TREC-INEX

SEMPRE (Free917)

.154

.159

.186

SEMPRE (WQ)

.197

.208

.247

Unoptimized

.409

.419

.502

LVDT

.419

.436

.541

WQT

SEMPRE (Free917)

.229

.255

.285

SEMPRE (WQ)

.374

.406

.449

Jacana

.239

.256

.329

Unoptimized

.377

.401

.474

LVDT

.295

.323

.406

We do better than both Jacana and SEMPRE

Gains over SEMPRE are slimmer for WQT.Slide22

NDCG comparison, TREC-INEX datasetSlide23

NDCG comparison, WQT datasetSlide24

SummaryQuery interpretation is rewarding, but non-trivialSemantic parsers do not work well with syntax-less telegraphic queries, segmentation based models work betterEntity-relation-type-selector template better than type-selector templateKnowledge graph and corpus provide complementary benefitsSlide25

Extend to NL queriesMore natural type and relation hintsMore filler words; need better word selection for corpus matchHandle relation joinsRequires more complex relation modelLarge interpretation spaceFuture WorkSlide26

Thank you!Slide27

Generating interpretationsCombine type and relation hint

Identify entities first

3 partitions: E

1

, T

2

/R, and SSlide28

KGBetter quality content than corpusGenerally incompleteCorpusHas better recall but more noiseDependence on annotation algorithmComplementary Benefits!KG vs. Corpus