/
Entity Categorization  Over Large Document Collections Entity Categorization  Over Large Document Collections

Entity Categorization Over Large Document Collections - PowerPoint Presentation

missroach
missroach . @missroach
Follow
342 views
Uploaded On 2020-08-26

Entity Categorization Over Large Document Collections - PPT Presentation

Presenter ShuYa Li Authors Venkatesh Ganti Arnd Christian König Rares Vernica KDD 2008 2 Outline Motivation Objective Methodology Experiments and Results ID: 802264

list entity pretty extraction entity list extraction pretty julia woman context roberts pairs feature large document member starred relation

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Entity Categorization Over Large Docume..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Entity Categorization Over Large Document Collections

Presenter : Shu-Ya LiAuthors : Venkatesh Ganti, Arnd Christian König, Rares Vernica

KDD,

2008

Slide2

2

OutlineMotivationObjectiveMethodologyExperiments and ResultsConclusionComments

Slide3

MotivationGoing from unstructured data to structured data

Extracting entities (people, movies) from documents and identifying the categories (painter, writer, actor)Most prior approaches (unary relation extraction)only analyzed the local document context within which entities occur.3

Donald Knuth

works in research …

Prior approaches

is-a-researcher (

Donald_Knuth

)

Context

Entity

[Entity]

present results

But…

[Entity]

publish

is-a-researcher (Entity)?

companies

newspapers

Slide4

Objectives

In this paper, we improve the accuracy of entity categorization by considering an entity’s context across multiple documentsexploiting existing large lists of related entities4

}

([Entity],

is-a-researcher

)

“…

[Entity]

published…”

“…

[Entity]

’s

paper…”

“…

[Entity]

gave a talk…”

Multi-Feature Relation Extractor

[Entity]

, ‘paper’

[Entity]

, ‘talk’

[Entity]

, ‘published’

Slide5

Methodology

5

(

Yao_Ming

, is-a-athlete)

Ex: Extraction of

is-a-movie

relation

… Julia Roberts

starred

in

Pretty Woman

in 1988 …

Entity

actor name

Alan Alba

Richard

Gere

Julia Roberts

Actor-List

Feature: Co-occurrence

between

entity

and

actor name

in context.

(Pretty Woman , is-a-movie)

Slide6

Methodology - Processing large Document Collections

6Context Feature

Extraction

Document Corpus

D

Rule-based

Extraction

Classification

n-gram

Extraction

Synopsis of

L

Verification

(Delete

false

Positives)

Co-Occurrence

List

corpus

L

Aggregation

List-Member

ExtractionList-Member Detection

Entity – Candidate

Context

Pairs

Entity-Feature Pairs

Entity-List

Pairs

Classifiers

C

retaining

the most important list

members

a known set

of

directors

(as

ε

)

a

list of actors (

as )

3.2 million documents

from

Wiki

Amy

Adams

Elizabeth

 

Reaser

Julia

 

Roberts

Tara

 

Reid

Judy

 

Reyes

E

1

: Pretty Woman

E

2

: Mystic Pizza

E

3

:

Doubt

E

4

: Duplicity

E

5

:

Enchanted

}

Actors

list

wiki

Slide7

Methodology - Processing large Document Collections

7Context Feature

Extraction

Document Corpus

D

Rule-based

Extraction

Classification

n-gram

Extraction

Synopsis of

L

Verification

(Delete

false

Positives)

Co-Occurrence

List

corpus

L

Aggregation

List-Member

Extraction

List-Member Detection

Entity – Candidate

Context Pairs

Entity-Feature

Pairs

Entity-List

Pairs

Classifiers

C

Scanning

D

once

 

Julia

Roberts starred

in

Pretty Woman

in 1988 …

{Julia, Roberts, starred, Pretty, Woman

,

Julia Roberts, Pretty Woman, …

}

1.

t

he

large amount of data written

2. not

expected to

contain an

entity is a member of a list

 

Our Approach – Bloom Filter

 

{starred, Pretty, Woman, Pretty Woman, …

}

(Julia Robert,

starred

)

(Julia Robert,

Pretty

)

(Julia Robert,

Woman

)

(Julia Robert,

Pretty Woman

)

Verification

Slide8

Experiments

8

Slide9

9

Slide10

ConclusionStudied the effect of aggregate context in relation extraction.

Proposed efficient processing techniques for large text corpora.Both aggregate and co-occurrence features provide significant increase in extraction accuracy compared to single-context classifiers.10

Slide11

CommentsAdvantage

The first half of this paper is clear.DrawbackBut the first half of this paper isn’t clear.ApplicationEntity categorization11