/
One Simple Thing To Immediately Make Extreme Classification One Simple Thing To Immediately Make Extreme Classification

One Simple Thing To Immediately Make Extreme Classification - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
382 views
Uploaded On 2017-08-11

One Simple Thing To Immediately Make Extreme Classification - PPT Presentation

Find Out What Rachel McAdams and Harrison Ford have to say about it One Simple Thing To Immediately Make Extreme Classification Easy Find Out What Rachel McAdams and Harrison Ford have to say about it ID: 577788

plausible labels eigenvalue classifier labels plausible classifier eigenvalue problem strategy small solve tree compute expensive invoke zero

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "One Simple Thing To Immediately Make Ext..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

One Simple Thing To Immediately Make Extreme Classification Easy

Find Out What Rachel McAdams and Harrison Ford have to say about itSlide2

One Simple Thing To Immediately Make Extreme Classification Easy

Find Out What Rachel McAdams and Harrison Ford have to say about it

Joint work with

Nikos KarampatziakisSlide3
Slide4

Facebook arguably has the data to solve this, but …

h

ow to do it?Slide5

Facebook arguably has the data to solve this, but …

h

ow to do it?

There are

billions

of possible labels.Slide6

Facebook arguably has the data to solve this, but …

h

ow to do it?

There are

billions

of possible labels.

Computational challenges.

Statistical challenges.Slide7
Slide8

Can we quickly identify plausible labels?Slide9

Can we quickly identify plausible labels?

w

ithout sacrificing quality?Slide10

Can we quickly identify plausible labels?

w

ithout sacrificing quality?

o

r even, improving quality?Slide11

Strategy:

1. Compute

small

set of plausible labels

Given an example:

2. Invoke

expensive

classifier over plausible labels only.Slide12

labels

 Slide13

labels

 Slide14

Strategy:

1. Compute

small

set of plausible labels

Given an example:

2. Invoke

expensive

classifier over plausible labels only.Slide15

Strategy:

1. Compute

small

set of plausible labels

Given an example:

2. Invoke

expensive

classifier over plausible labels only.

Not a new ideaSlide16
Slide17
Slide18
Slide19

Speeding up inference is nice but …

Speeding up learning is critical. Slide20

Strategy:

1. Compute

small

set of plausible labels

Given an example:

2. Invoke

expensive

classifier over plausible labels only.Slide21

Strategy:

1. Compute

small

set of plausible labels

Given an example:

2. Invoke

expensive

classifier over plausible labels only.

Idea: cheaply make a classifier that identifies plausible labels.Slide22
Slide23

Pretend we’re doing multiclass for a minute …

Build a tree

At each node try to send each

class’s examples exclusively left or right

While sending roughly the same number

of examples left or right in aggregateSlide24

 

 

 Slide25

 

 

 

 

 

 Slide26

 

 

 

 

 

 Slide27

Achieve this via an eigenvalue problem Slide28

Achieve this via an eigenvalue problem

``Push all class-conditional means away from zero’’Slide29

Achieve this via an eigenvalue problem

``Push all class-conditional means away from zero’’

``while having average value of zero’’Slide30

Achieve this via an eigenvalue problem

 

 

 

 

 

``Push all class-conditional means away from zero’’

``while having average value of zero’’Slide31

 

 

 Slide32

 

 

 Slide33

 

 

 

Works for

multilabel

!Slide34

Problem

In high dimensions, most vectors are orthogonal

routing margins tend to be small

So we use

randomized routing

during trainingSlide35

 

 

 

 

 

 Slide36

Training the ``plausible label’’ filter:

Build a tree

At each internal node, solve eigenvalue problem

Route examples and

recurse

to desired depth

At leaf nodes, most frequent classes are ``plausible’’Slide37

Training the ``plausible label’’ classifier:

Build a tree

At each internal node, solve eigenvalue problem

Route examples and

recurse

to desired depth

At leaf nodes, most frequent classes are ``plausible’’

How deep should the tree be?

How many classes to include at each leaf?Slide38

Reminder

Once we have the plausible label filter,

We train an underlying classifier.

(Logistic regression)Slide39
Slide40

Twitter

Predict hashtags from tweets

Labels = hashtags

Features = words (unigrams + bigrams)

Build tree onlySlide41

#jobs #it #

nowplaying #manager #dev #engineering #ff #java #marketing #

php

#job #net #project #developer #hiring #programmer #engineer #consultant #customer #flashSlide42

#ascendant #

mediumcoeli #nowplaying #

leo

#cancer #

sagittarius

#scorpio #

virgo

#

libra

#

gemini

#

ff

#

capricorn

#jobs #

taurus

#

aquarius

#aries #pisces

#fb #news #tweetmyjobsSlide43

#

nowplaying #ff #jobs #retweetthisif

#

bieberbemine

#

happybirthdayjustin #babyonitunes

#

biebcrbemine

#

justinbiebcr

#fb #

tweetmyjobs

#

damnhowtrue

#

followfriday

#

biebcrgasm

#1 #

grindmebieber #quote #news #retweetthis

#followmejpSlide44

Twitter

Leaf nodes look promising, but …

Popular tags everywhere.Slide45

LSHTC

Predict Wikipedia tags from documents (token counts)

Kaggle competitionSlide46

LSHTC Slide47

Overall

Statistical performance is good

Computational performance is goodSlide48

Limitations

Only works when linear classifier is good

Linear routing node

Using linear predictor of

given

 

Not deep!Slide49

Next Steps

Online learning version

Statistical questions

Deep routing nodesSlide50

Summary

Wrapper approach for

accelerating extreme learning

Leverages (super-scalable)

eigenvalue strategy

Good for textSlide51

http://arxiv.org/abs/1511.03260

https://github.com/pmineiro/xlst