Find Out What Rachel McAdams and Harrison Ford have to say about it One Simple Thing To Immediately Make Extreme Classification Easy Find Out What Rachel McAdams and Harrison Ford have to say about it ID: 577788
Download Presentation The PPT/PDF document "One Simple Thing To Immediately Make Ext..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
One Simple Thing To Immediately Make Extreme Classification Easy
Find Out What Rachel McAdams and Harrison Ford have to say about itSlide2
One Simple Thing To Immediately Make Extreme Classification Easy
Find Out What Rachel McAdams and Harrison Ford have to say about it
Joint work with
Nikos KarampatziakisSlide3Slide4
Facebook arguably has the data to solve this, but …
h
ow to do it?Slide5
Facebook arguably has the data to solve this, but …
h
ow to do it?
There are
billions
of possible labels.Slide6
Facebook arguably has the data to solve this, but …
h
ow to do it?
There are
billions
of possible labels.
Computational challenges.
Statistical challenges.Slide7Slide8
Can we quickly identify plausible labels?Slide9
Can we quickly identify plausible labels?
w
ithout sacrificing quality?Slide10
Can we quickly identify plausible labels?
w
ithout sacrificing quality?
o
r even, improving quality?Slide11
Strategy:
1. Compute
small
set of plausible labels
Given an example:
2. Invoke
expensive
classifier over plausible labels only.Slide12
labels
Slide13
labels
Slide14
Strategy:
1. Compute
small
set of plausible labels
Given an example:
2. Invoke
expensive
classifier over plausible labels only.Slide15
Strategy:
1. Compute
small
set of plausible labels
Given an example:
2. Invoke
expensive
classifier over plausible labels only.
Not a new ideaSlide16Slide17Slide18Slide19
Speeding up inference is nice but …
Speeding up learning is critical. Slide20
Strategy:
1. Compute
small
set of plausible labels
Given an example:
2. Invoke
expensive
classifier over plausible labels only.Slide21
Strategy:
1. Compute
small
set of plausible labels
Given an example:
2. Invoke
expensive
classifier over plausible labels only.
Idea: cheaply make a classifier that identifies plausible labels.Slide22Slide23
Pretend we’re doing multiclass for a minute …
Build a tree
At each node try to send each
class’s examples exclusively left or right
While sending roughly the same number
of examples left or right in aggregateSlide24
Slide25
Slide26
Slide27
Achieve this via an eigenvalue problem Slide28
Achieve this via an eigenvalue problem
``Push all class-conditional means away from zero’’Slide29
Achieve this via an eigenvalue problem
``Push all class-conditional means away from zero’’
``while having average value of zero’’Slide30
Achieve this via an eigenvalue problem
``Push all class-conditional means away from zero’’
``while having average value of zero’’Slide31
Slide32
Slide33
Works for
multilabel
!Slide34
Problem
In high dimensions, most vectors are orthogonal
routing margins tend to be small
So we use
randomized routing
during trainingSlide35
Slide36
Training the ``plausible label’’ filter:
Build a tree
At each internal node, solve eigenvalue problem
Route examples and
recurse
to desired depth
At leaf nodes, most frequent classes are ``plausible’’Slide37
Training the ``plausible label’’ classifier:
Build a tree
At each internal node, solve eigenvalue problem
Route examples and
recurse
to desired depth
At leaf nodes, most frequent classes are ``plausible’’
How deep should the tree be?
How many classes to include at each leaf?Slide38
Reminder
Once we have the plausible label filter,
We train an underlying classifier.
(Logistic regression)Slide39Slide40
Twitter
Predict hashtags from tweets
Labels = hashtags
Features = words (unigrams + bigrams)
Build tree onlySlide41
#jobs #it #
nowplaying #manager #dev #engineering #ff #java #marketing #
php
#job #net #project #developer #hiring #programmer #engineer #consultant #customer #flashSlide42
#ascendant #
mediumcoeli #nowplaying #
leo
#cancer #
sagittarius
#scorpio #
virgo
#
libra
#
gemini
#
ff
#
capricorn
#jobs #
taurus
#
aquarius
#aries #pisces
#fb #news #tweetmyjobsSlide43
#
nowplaying #ff #jobs #retweetthisif
#
bieberbemine
#
happybirthdayjustin #babyonitunes
#
biebcrbemine
#
justinbiebcr
#fb #
tweetmyjobs
#
damnhowtrue
#
followfriday
#
biebcrgasm
#1 #
grindmebieber #quote #news #retweetthis
#followmejpSlide44
Twitter
Leaf nodes look promising, but …
Popular tags everywhere.Slide45
LSHTC
Predict Wikipedia tags from documents (token counts)
Kaggle competitionSlide46
LSHTC Slide47
Overall
Statistical performance is good
Computational performance is goodSlide48
Limitations
Only works when linear classifier is good
Linear routing node
Using linear predictor of
given
Not deep!Slide49
Next Steps
Online learning version
Statistical questions
Deep routing nodesSlide50
Summary
Wrapper approach for
accelerating extreme learning
Leverages (super-scalable)
eigenvalue strategy
Good for textSlide51
http://arxiv.org/abs/1511.03260
https://github.com/pmineiro/xlst