/
Classifying Entities into Classifying Entities into

Classifying Entities into - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
393 views
Uploaded On 2015-11-12

Classifying Entities into - PPT Presentation

an Incomplete Ontology Bhavana Dalvi William W Cohen Jamie Callan School of Computer Science Carnegie Mellon University Motivation Existing Techniques Semisupervised Hierarchical Classification ID: 190957

exploratory class model classes class exploratory classes model state conquer divide condiment vegetable data country location food root constraints

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Classifying Entities into" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Classifying Entities into an Incomplete Ontology

Bhavana Dalvi, William W. Cohen, Jamie Callan

School of Computer Science,

Carnegie Mellon UniversitySlide2

Motivation

Existing Techniques

Semi-supervised Hierarchical Classification:

Carlson WSDM’10

Extending knowledge bases:

Finding new relations or attributes of existing concepts Mohamed et al. EMNLP’11

Unsupervised ontology discovery:

Adams et al. NIPS’10, Blei et al. JACM’10, Reisinger et al. ACL’09

Evolving Web-scale datasets

Billions

of entities and hundreds of thousands of concepts

Difficult to create a complete

ontology

Hierarchical classification of entities into incomplete ontologies is

neededSlide3

Contributions

Hierarchical

Exploratory EM

Adds new instances to the existing classes

Discovers new classes and

adds

them at appropriate places in the

ontology

Class constraints:

Inclusion:

Every entity that is “Mammal” is also an “Animal”

Mutual Exclusion:

If an entity is “Electronic Device” then its not “Mammal

”Slide4

Problem Definition

Input

Large set of data-points :

Some

known classes

: Class constraints betweenclasses Small number of seeds per known class: n Output Labels for all data-points Discover new classes from data: k Updated class constraints:

 Slide5

Review: Exploratory EM [Dalvi

et al. ECML 2013]

Initialize model with few seeds per class

Iterate till convergence

(Data likelihood and # classes)

E step:

Predict labels for unlabeled points

If P(Cj | Xi) is nearly-uniform for a data-point Xi, j=1 to kCreate a new class Ck+1, assign Xi to itM step: Recompute model parameters using seeds + predicted labels for unlabeled pointsNumber of classes might increase in each iterationCheck if model selection criterion is satisfied If not, revert to model in Iteration `t-1’ Classification/clusteringKMeans, NBayes, VMF …Max/Min ratioJS Divergence

AIC, BIC, AICc …Slide6

Hierarchical Exploratory EM

Initialize model with few seeds per class

Iterate till convergence

(Data likelihood and # classes)

E step:

Predict labels for unlabeled

points

Assign a consistent bit vector of labels for each unlabeled datapointIf is nearly-uniform for a data-point Create a new class , assign to itUpdate class constraints accordinglyM step: Recompute model parameters using seeds + predicted labels for unlabeled pointsNumber of classes might increase in each iterationSince the E step follows class constraints this step need not be modifiedCheck if model selection criterion is satisfied If not, revert to model in Iteration `t-1’

 Slide7

Divide-And-Conquer Exploratory EM

Mutual

ExcIusion

Root

Food

Location

Country

StateVegetableCondiment

Inclusion

E.g. Spinach, Potato, Pepper…

Level 1

Level 2

Level 3

Assumptions:

Classes

are arranged in a tree-structured hierarchy.

Classes

at any level of the hierarchy are mutually exclusive.Slide8

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

CaliforniaSlide9

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

California

0.9

0.1Slide10

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

California

0.8

0.2

0.9

0.1

0

1

0

0

1

1

0Slide11

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

CokeSlide12

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

Coke

0.1

0.9Slide13

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

Coke

0.1

0.9

0.55

0.45Slide14

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

Coke

0.1

0.9

0.55

0.45

C8

Coke

1

0

0

0

1

0

0

1Slide15

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

Coke

0.1

0.9

0.55

0.45

 

Adds to class constraints

 

1

0

0

0

1

0

0

1

CokeSlide16

Divide-And-Conquer Exploratory EM

Root

Food

Location

Country

State

Vegetable

Condiment

1.0

Cat

C8

C9

0.45

0.55

Cat

0

0

0

0

0

0

0

1

1

Adds to class constraints

 Slide17

What are we trying to optimize? Objective Function :

Maximize

{

Log Data Likelihood

Model Penalty

} m: #clusters, Params{C1… Cm} subject to Class constraints: Zm Slide18

Datasets

Ontology 1

Ontology 2

Dataset

#Classes

#Levels

#NELL

entities#ContextsDS-1

11

3

2.5K

3.4M

DS-2

39

4

12.9K

6.7M

Clueweb09 Corpus

+

Subsets of NELL Slide19

Results

Dataset

#Train/Test Points

DS-1

335/ 2.2K

DS-2

1.5K/11.4KSlide20

Results

Dataset

#Train/Test Points

Level

#Seed/ #Ideal Classes

DS-1

335/ 2.2K

22/334/7DS-2

1.5K/11.4K

2

3.9/4

3

9.4/24

4

2.4/10Slide21

Results

Dataset

#Train/Test Points

Level

#Seed/ #Ideal Classes

Macro-averaged Seed Class F1

FLAT

SemisupEMExploratoryEM

DS-1

335/ 2.2K

2

2/3

43.2

78.7 *

3

4/7

34.4

42.6 *

DS-2

1.5K/11.4K

2

3.9/4

64.3

53.40

3

9.4/24

31.3

33.7 *

4

2.4/10

27.5

38.9 *Slide22

Results

Dataset

#Train/Test Points

Level

#Seed/ #Ideal Classes

Macro-averaged Seed Class F1

FLAT

DAC

SemisupEM

ExploratoryEM

SemisupEM

ExploratoryEM

DS-1

335/ 2.2K

2

2/3

43.2

78.7 *

69.5

77.2 *

3

4/7

34.4

42.6 *

31.3

44.4 *

DS-2

1.5K/11.4K

2

3.9/4

64.3

53.40

65.4

68.9 *

3

9.4/24

31.3

33.7 *

34.9

41.7 *

4

2.4/10

27.5

38.9 *

43.2

42.40 Slide23

Conclusions

Hierarchical Exploratory EM

works with incomplete class hierarchy and few seed instances to extend the existing knowledge base.

Encouraging preliminary results

Hierarchical classification

Flat classification

Exploratory Learning

Semi-supervised Learning Future work: Incorporate arbitrary class constraintsEvaluate the newly added clusters Slide24

Thank You

Questions?Slide25

Extra SlidesSlide26

Class Creation Criterion

Given

MinMax

ratio

:

Jensen-Shannon

divergence:

JS-Div(

 Slide27

Model Selection

Extended

Akaike

Information

Criterion

AICc

(g) = -2*L(g) + 2*v + 2*v*(v+1)/(n – v -1) Here g: model being evaluated, L(g): log-likelihood of data given g, v: number of free parameters of the model, n: number of data-points.