/
Large Scale Multi-Label Classification via Large Scale Multi-Label Classification via

Large Scale Multi-Label Classification via - PowerPoint Presentation

funname
funname . @funname
Follow
342 views
Uploaded On 2020-10-06

Large Scale Multi-Label Classification via - PPT Presentation

MetaLabeler Lei Tang Arizona State University Suju Rajan and Vijay K Narayanan Yahoo Data Mining amp Research Large Scale MultiLabel Classification Huge number of instances and categories ID: 813107

model labels metalabeler meta labels model meta metalabeler number categories data credit training svm label predict instances query rest

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Large Scale Multi-Label Classification v..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Large Scale Multi-Label Classification via

MetaLabeler

Lei Tang

Arizona State University

Suju Rajan

and Vijay K. Narayanan

Yahoo! Data Mining & Research

Slide2

Large Scale Multi-Label Classification

Huge number of instances and categories

Common for online contents

Web Page Classification

Query

Categorization

Video Annotation/Organization

Social Bookmark/Tag

Recommendation

Slide3

Challenges

Multi-Class

: t

housands of categories

Multi-Label: each instance has >1 labels

Large Scale: huge number of instances and categoriesOur query categorization problem: 1.5M queries, 7K categories

Yahoo! Directory

792K docs, 246K categories

in Liu et al. 05Most existing multi-label methods do not scalestructural SVM, mixture model, collective inference, maximum-entropy

model, etc.

The simplest

One-vs-Rest SVM is still widely used

Slide4

One-vs-Rest SVM

x

1

C

1

, C

3

x

2

C

1

, C

2

, C

4

x

3

C

2x4C2, C4

x1+x2+x3-x4-

x1-x2+x3+x4+

x1+x2-x3-x4-

x1-x2+x3-x4+

C1

C2

C3

C4

SVM1

SVM2

SVM3

SVM4

C1

C2

C3

C4

Predict

Slide5

One-vs-Rest SVM

Pros:

Simple, Fast, Scalable

Each label trained independently, easy

to parallelCons:Highly skewed class distribution (few +, many -)

Biased prediction scoresOutput reasonable good

ranking (Rifkin and Klauta

04)e.g. 4 categories C

1, C2, C3, C

4

True Labels for x1

: C1, C3

Prediction Scores: {s

1, s3

} > {s

2

, s

4

} Predict the number of labels?

Slide6

MetaLabeler Algorithm

Obtain a ranking of class membership for each instance

Any genetic ranking algorithm can be appliedUse One-

vs-Rest SVM

Build a Meta Model to predict the number of top classesConstruct Meta LabelConstruct Meta Feature

Build Meta Model

Slide7

Meta Model – Training

Q

2

= cotton children jeans

Labels:

Children clothing

Q

3 = leather fashion in 1990sLabels:

FashionWomen Clothing Leather Clothing

Q

1 = affordable cocktail dressLabels:

Formal wearWomen Clothing

Q1: 2

Q2: 1

Q3: 3

Meta data

Query: #labels

Meta-Model

One-vs-RestSVMClothingWomenClothingFormalwearFashionChildrenClothing

RegressionLeather clothingHow to handle predictions like 2.5 labels?

Slide8

Meta Feature Construction

Content-Based

Use raw dataRaw data contains all the infoScore-Based

Use prediction scoresBias with scores might be learned

Rank-BasedUse sorted prediction scores

C1

C2C3

C40.9

-0.20.7-0.6

C1

C2C3

C4Meta Feature0.9

-0.20.7

- 0.6

Meta Feature

0.9

0.7

-0.2

-0.6

Slide9

MetaLabeler Prediction

Given one instance:

Obtain the rankings for all labels;Use the meta model to predict the number of labelsPick the top-ranking labels

MetaLabeler

Easy to implementUse existing SVM package/software directlyCan be combined with a hierarchical structure easily

Simply build a Meta Model at each internal node

Slide10

Baseline Methods

Existing

thresholding methods (Yang 2001)Rank-based Cut (

Rcut)

output fixed number of top-ranking labels for each instanceProportion-based CutFor each label, choose a portion of test instances as positive

Not applicable for online predictionScore-based Cut (Scut, aka. threshold tuning)

For each label, determine a threshold based on cross-validation

Tends to overfit and is not very stable

MetaLabeler: A local RCut methodCustomize the number of labels for each instance

Slide11

Publicly Available Benchmark

Data

Yahoo! Web Page Classification11 data sets:each constructed from a top-level category

2nd level topics are the categories

16-32k instances, 6-15k features, 14-23 categories1.2 -1.6 labels per instance, maximum 17 labelsEach label has at least 100 instances

RCV1:A large scale text corpus 101 categories, 3.2 labels per instanceFor evaluation purpose, use 3000 for training, 3000 for testing

Highly skewed distribution (some labels have only 3-4 instances)

Slide12

MetaLabeler

of Different Meta

FeaturesWhich type of meta feature is more predictive?

Content-based MetaLabeler

outperforms other meta features

Slide13

Performance Comparison

MetaLabeler tends to outperform other methods

Slide14

Bias with MetaLabeler

The distribution of number of labels is imbalanced

Most instances have small number of labels;Small portion of data instances have many more labels

Imbalanced Distribution leads to bias in MetaLabeler

Prefer to predict lesser labelsOnly predict many labels with strong confidence

Slide15

Scalability Study

Threshold tuning requires cross-validation, otherwise

overfitMetaLabeler simply adds some meta labels and learn One-

vs-Rest SVMs

Slide16

Scalability Study (cond

.)

Threshold tuning: linearly increasing with number of categories in the dataE.g. 6000 categories -> 6000 thresholds to be tuned MetaLabeler: upper bounded by the maximum number of labels with one instance

E.g. 6000 categories but one instance has at most 15 labels

Just need to learn additional 15 binary SVMs Meta Model is “independent” of number of categories

Slide17

Application to Large Scale Query Categorization

Query categorization problem:

1.5 million unique queries: 1M for training, 0.5M for testing120k features

A 8-level taxonomy of 6433 categoriesMultiple labels

e.g. 0% interest credit card no transfer feeFinancial Services/Credit, Loans and

Debt/Credit/Credit Card/ Balance TransferFinancial Services/Credit, Loans and Debt/Credit/Credit

Card/ Low Interest Card

Financial Services/Credit, Loans and Debt/Credit/Credit

Card/ Low-No-fee Card

1.23 labels on average

At most 26 labels

Slide18

Flat Model

Flat Model: do not leverage the hierarchical structure

Threshold tuning on training data alone takes 40 hours to finish while MetaLabeler costs 2 hours.

Slide19

Hierarchical

Model - Training

Root

. . . . .

. . . . . . . . . . . .

. . . . .

. . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

Training Data

N

New Training Data

Step 1: Generate Training Data

Step 2: Roll up labels

Step 4: Train One vs. Rest SVM

Other

Step 3: Create “Other” Category

Slide20

Hierarchical

Model -

Prediction

Root

. . . . .

. . . . . . . . .

. . . . .

. . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

Query q

Predict using SVMs trained at root level

Query q

Query q

Stop !!!

Stop if reaching a leaf node or “other” category

m

1

m2m3m2m3m4c1c2c3OtherStop !!!

Slide21

Hierarchical Model +

MetaLabeler

Precision decrease by 1-2%, but recall is improved by 10% at deeper levels.

Slide22

Features in

MetaLabeler

Feature

Related Categories

Overstock.com

Mass Merchants/…/discount department stores

Apparel & Jewelry

Electronics & Appliances

Home & Garden

Books-Movies-Music-Tickets

Blizard

Toys & Hobbies/…/Video Game

Computing/…/Computer Game Software

Entertainment & Social Event/…/Fast Food Restaurant

Reference/News/Weather Information

Threading

Books-Movies-Music-Tickets/…/Computing Books

Computing/…/Programming

Health and Beauty/…/Unwanted Hair Toys and Hobbies/…/Sewing

Slide23

Conclusions & Future Work

MetaLabeler

is promising for large-scale multi-label classificationCore idea: learn a meta model to predict the number of labelsSimple, efficient and scalable

Use existing SVM software directly

Easy for practical deploymentFuture workHow to optimize MetaLabeler for desired performance ?

E.g. > 95% precisionApplication to social networking related tasks

Slide24

Questions?

Slide25

References

Liu, T., Yang, Y., Wan, H.,

Zeng, H., Chen, Z., and Ma, W. 2005. Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor

. Newsl.

7, 1 (Jun. 2005), 36-43. Rifkin, R. and Klautau, A. 2004. In Defense of One-Vs-All Classification. J. Mach. Learn. Res.

5 (Dec. 2004), 101-141.Yang, Y. 2001. A study of thresholding strategies for text categorization. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval

(New Orleans, Louisiana, United States). SIGIR '01. ACM, New York, NY, 137-145.

Slide26

Hierarchical vs. Flat Model

Flat model

Build a one-vs-rest SVM for all the labels

No taxonomy information during training.

Hierarchical model has about 5% higher recall fat deeper levels.