/
Hierarchical Semi-supervised Hierarchical Semi-supervised

Hierarchical Semi-supervised - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
378 views
Uploaded On 2017-08-21

Hierarchical Semi-supervised - PPT Presentation

Classification with Incomplete Class Hierarchies Bhavana Dalvi Aditya Mishra and William W Cohen Allen Institute for Artificial Intelligence School Of Computer Science ID: 580817

classes class entity supervised class classes supervised entity constraints flat optdac hierarchical semi exploratory constraint label ontology seed method

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hierarchical Semi-supervised" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Hierarchical Semi-supervised

Classification

with Incomplete Class

Hierarchies

Bhavana Dalvi¶*, Aditya Mishra†, and William W. Cohen*¶ Allen Institute for Artificial Intelligence, * School Of Computer Science, Carnegie Mellon University,† Department of Computer Science & Software Engineering, Seattle University

Motivation

Datasets

Method: OptDAC Exploratory EM

Experimental Results

Acknowledgements :

This work is supported in part by

Google PhD

fellowship in

Information Extraction, and NSF grant No. IIS1250956-NSFCOHEN

.

Conclusions

In an entity classification task, topic or concept hierarchies are often incomplete. This can lead to semantic drift of known classes or topics.

Our previous work on

Exploratory Learning (Dalvi et al. ECML 2013)

extends the semi-supervised EM algorithm by dynamically adding new classes when appropriate. In this paper, we present Exploratory learning techniques for hierarchical semi-supervised learning tasks.

We focus on entity classification task where

e

ach entity is represented by either text context or table co-occurrence features. Given a few seed examples per Knowledge Base(KB) category, the task is to classify unlabeled entities into KB categories.KB categories are arranged in an ontology. There are subset and disjointness constraints defined between these classes. Further, the class hierarchy can be incomplete.Our proposed method (OptDAC) can learn new examples of existing classes, as well as extend the class hierarchy in a single unified framework.

Optimal Label Assignment given Class Constraints

In this paper, we propose the Hierarchical Exploratory EM approach that can take an incomplete class ontology as input, along with a few seed examples of each class, to populate new instances of seeded classes and extend the ontology with newly discovered classes. Our proposed hierarchical exploratory EM method, named OptDAC-ExploreEM performs better than flat classification and hierarchical semi-supervised EM methods at all levels of hierarchy, especially as we go further down the hierarchy. Experiments show that OptDAC-ExploreEM outperforms its semi-supervised variant on average by 13% in terms of seed class F1 scores. It also outperforms both previously proposed exploratory learning approaches FLAT-ExploreEM and DAC-ExploreEM in terms of seed class F1on average by 10% and 7% respectively. In the future, we would like to apply our method on datasets with non-tree structured class hierarchies.

Comparison: macro averaged seeded-class F1

OptDAC

reduces semantic drift of seeded classes.

Input:

Class

constraints: Subset, Mutex(disjoint)

Output: Consistent bit vector for Maximize {likelihood of assignment – constraint violation penalty}

 

DatasetStatistics#Entities#Features# (Entity, label) pairsText-Small2.5K3.4M7.2KText-Medium12.9K6.7M42.2KTable-Small4.3K0.96M12.2KTable-Medium33.4K2.2M126.K

StatisticOntologySmallMedium#Classes34#levels in the hierarchy1139#classes per level1, 3, 71, 4, 24, 10

Inputs: ; N: |X|; K: number of classes; : Class constraints (subclass or disjointness constraints); Outputs: : Labels for parameters for k seed and m newly added classes; set of constraints between k+m classesInitialize the modelwith a few seeds per class Iterate till convergence (till data likelihood AND #classes converges) E Step (Iteration t): Assign a bit vector of categories to each gloss For i = 1 : N Find ) for all classes Optimal-Label-Assignment {If a new class is created, then class constraints are updated accordingly.} UpdateConstraints M step: Re-compute model parameters Re-compute based on current label assignments . Do model selection

 

Subset constraint

Mutex Constraint

Mutex constraint Penalty

Score of label assignment

Subset constraint Penalty

Evaluation of extended class hierarchies

OptDAC

with varying amount of training data

Dataset

Avg. Runtime in sec.

Avg. runtime in multiple of Flat Semi-supervised EM

FLAT

FLAT

OptDACSemi-supervised EMExploratory EMSemi-supervised EMExploratory EMText-Small53.58717Table-Small50.731021Text-Medium524.751125Table-Medium5932.44710

Runtime of Flat vs. OptDAC method on different datasets

Text-Small

Table-Small

This dataset is made publicly available at http://rtw.ml.cmu.edu/wk/WebSets/hierarchical_ExploratoryLearning_WSDM2016/index.html

When New Classes Are Created?

1

2

3

4

5

6

7

8

10

11

9

C

new

Near uniform?

Test: Best assignment using the mixed integer program should pick C

new

Level = 2 3 4

Small Ontology

Medium Ontology

An example Text pattern feature for entity “Pittsburgh” is (“lives in ARG”, 1000), indicating that the entity Pittsburgh appeared in position ARG of the text context “live in ARG” for 1000 times in the sentences from Clueweb09 dataset.

An example Table context feature

for entity “Pittsburgh

” is (“clueweb09-en0011-94-04

::2:1”, 1)

indicates that

the entity “Pittsburgh”

appeared once

in HTML table 2, column 1 from ClueWeb09 document

id

“clueweb09-en0011-94-04

”.

denotes statistically significant improvements

(0.05 significance level) w.r.t. FLAT ExloreEM