an Incomplete Ontology Bhavana Dalvi William W Cohen Jamie Callan School of Computer Science Carnegie Mellon University Motivation Existing Techniques Semisupervised Hierarchical Classification ID: 190957
Download Presentation The PPT/PDF document "Classifying Entities into" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Classifying Entities into an Incomplete Ontology
Bhavana Dalvi, William W. Cohen, Jamie Callan
School of Computer Science,
Carnegie Mellon UniversitySlide2
Motivation
Existing Techniques
Semi-supervised Hierarchical Classification:
Carlson WSDM’10
Extending knowledge bases:
Finding new relations or attributes of existing concepts Mohamed et al. EMNLP’11
Unsupervised ontology discovery:
Adams et al. NIPS’10, Blei et al. JACM’10, Reisinger et al. ACL’09
Evolving Web-scale datasets
Billions
of entities and hundreds of thousands of concepts
Difficult to create a complete
ontology
Hierarchical classification of entities into incomplete ontologies is
neededSlide3
Contributions
Hierarchical
Exploratory EM
Adds new instances to the existing classes
Discovers new classes and
adds
them at appropriate places in the
ontology
Class constraints:
Inclusion:
Every entity that is “Mammal” is also an “Animal”
Mutual Exclusion:
If an entity is “Electronic Device” then its not “Mammal
”Slide4
Problem Definition
Input
Large set of data-points :
Some
known classes
: Class constraints betweenclasses Small number of seeds per known class: n Output Labels for all data-points Discover new classes from data: k Updated class constraints:
Slide5
Review: Exploratory EM [Dalvi
et al. ECML 2013]
Initialize model with few seeds per class
Iterate till convergence
(Data likelihood and # classes)
E step:
Predict labels for unlabeled points
If P(Cj | Xi) is nearly-uniform for a data-point Xi, j=1 to kCreate a new class Ck+1, assign Xi to itM step: Recompute model parameters using seeds + predicted labels for unlabeled pointsNumber of classes might increase in each iterationCheck if model selection criterion is satisfied If not, revert to model in Iteration `t-1’ Classification/clusteringKMeans, NBayes, VMF …Max/Min ratioJS Divergence
AIC, BIC, AICc …Slide6
Hierarchical Exploratory EM
Initialize model with few seeds per class
Iterate till convergence
(Data likelihood and # classes)
E step:
Predict labels for unlabeled
points
Assign a consistent bit vector of labels for each unlabeled datapointIf is nearly-uniform for a data-point Create a new class , assign to itUpdate class constraints accordinglyM step: Recompute model parameters using seeds + predicted labels for unlabeled pointsNumber of classes might increase in each iterationSince the E step follows class constraints this step need not be modifiedCheck if model selection criterion is satisfied If not, revert to model in Iteration `t-1’
Slide7
Divide-And-Conquer Exploratory EM
Mutual
ExcIusion
Root
Food
Location
Country
StateVegetableCondiment
Inclusion
E.g. Spinach, Potato, Pepper…
Level 1
Level 2
Level 3
Assumptions:
Classes
are arranged in a tree-structured hierarchy.
Classes
at any level of the hierarchy are mutually exclusive.Slide8
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
CaliforniaSlide9
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
California
0.9
0.1Slide10
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
California
0.8
0.2
0.9
0.1
0
1
0
0
1
1
0Slide11
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
CokeSlide12
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
Coke
0.1
0.9Slide13
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
Coke
0.1
0.9
0.55
0.45Slide14
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
Coke
0.1
0.9
0.55
0.45
C8
Coke
1
0
0
0
1
0
0
1Slide15
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
Coke
0.1
0.9
0.55
0.45
Adds to class constraints
1
0
0
0
1
0
0
1
CokeSlide16
Divide-And-Conquer Exploratory EM
Root
Food
Location
Country
State
Vegetable
Condiment
1.0
Cat
C8
C9
0.45
0.55
Cat
0
0
0
0
0
0
0
1
1
Adds to class constraints
Slide17
What are we trying to optimize? Objective Function :
Maximize
{
Log Data Likelihood
–
Model Penalty
} m: #clusters, Params{C1… Cm} subject to Class constraints: Zm Slide18
Datasets
Ontology 1
Ontology 2
Dataset
#Classes
#Levels
#NELL
entities#ContextsDS-1
11
3
2.5K
3.4M
DS-2
39
4
12.9K
6.7M
Clueweb09 Corpus
+
Subsets of NELL Slide19
Results
Dataset
#Train/Test Points
DS-1
335/ 2.2K
DS-2
1.5K/11.4KSlide20
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
DS-1
335/ 2.2K
22/334/7DS-2
1.5K/11.4K
2
3.9/4
3
9.4/24
4
2.4/10Slide21
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
Macro-averaged Seed Class F1
FLAT
SemisupEMExploratoryEM
DS-1
335/ 2.2K
2
2/3
43.2
78.7 *
3
4/7
34.4
42.6 *
DS-2
1.5K/11.4K
2
3.9/4
64.3
53.40
3
9.4/24
31.3
33.7 *
4
2.4/10
27.5
38.9 *Slide22
Results
Dataset
#Train/Test Points
Level
#Seed/ #Ideal Classes
Macro-averaged Seed Class F1
FLAT
DAC
SemisupEM
ExploratoryEM
SemisupEM
ExploratoryEM
DS-1
335/ 2.2K
2
2/3
43.2
78.7 *
69.5
77.2 *
3
4/7
34.4
42.6 *
31.3
44.4 *
DS-2
1.5K/11.4K
2
3.9/4
64.3
53.40
65.4
68.9 *
3
9.4/24
31.3
33.7 *
34.9
41.7 *
4
2.4/10
27.5
38.9 *
43.2
42.40 Slide23
Conclusions
Hierarchical Exploratory EM
works with incomplete class hierarchy and few seed instances to extend the existing knowledge base.
Encouraging preliminary results
Hierarchical classification
Flat classification
Exploratory Learning
Semi-supervised Learning Future work: Incorporate arbitrary class constraintsEvaluate the newly added clusters Slide24
Thank You
Questions?Slide25
Extra SlidesSlide26
Class Creation Criterion
Given
MinMax
ratio
:
Jensen-Shannon
divergence:
JS-Div(
Slide27
Model Selection
Extended
Akaike
Information
Criterion
AICc
(g) = -2*L(g) + 2*v + 2*v*(v+1)/(n – v -1) Here g: model being evaluated, L(g): log-likelihood of data given g, v: number of free parameters of the model, n: number of data-points.