/
Online Multiple Kernel Classification Online Multiple Kernel Classification

Online Multiple Kernel Classification - PowerPoint Presentation

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
423 views
Uploaded On 2016-05-19

Online Multiple Kernel Classification - PPT Presentation

Steven CH Hoi Rong Jin Peilin Zhao Tianbao Yang Machine Learning 2013 Presented by Audrey Cheong Electrical amp Computer Engineering MATH 6397 Data Mining Background Online ID: 326082

stochastic deterministic combination kernel deterministic stochastic kernel combination algorithm update perceptron classifiers classifier online learning prediction kernels omkc vectors weights training support

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Online Multiple Kernel Classification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Online Multiple Kernel Classification

Steven C.H. Hoi,

Rong

Jin,

Peilin

Zhao,

Tianbao

Yang

Machine Learning (2013)

Presented by Audrey Cheong

Electrical & Computer Engineering

MATH 6397: Data MiningSlide2

Background - Online

Online learning

Learns one instance at a time and predicts labels for future instances

Learner is given an instance

Learner predicts the label of the instance

Learner is given the correct label

Learner refines its prediction mechanism

2Slide3

Background – Multiple Kernel

Composed of two online learning algorithms:

Perceptron algorithm (Rosenblatt 1958)

Type of linear classifier

Learns a classifier for a given kernelHedge algorithm (Freund and Schapire 1997) Combines classifiers by linear weights: Classifier 2 Perceptron

: Classifier 1

 Perceptron: Classifier 3

 Perceptron

where

 

Hedge3Slide4

Perceptron algorithm

Input vector :

Output vector :

;

Weights :

Threshold :

Arithmetic test :Minimize :

 

 

4Slide5

Hedge algorithm

Distribute weight

among

classifiersSetting new weights :

for discount weight

if the prediction is incorrect and if correct

 5Slide6

Notations

: trial

: mixture of kernel classifiers

: indicates if training instance

is misclassified by the

kernel classifier at trial

t

: indicator function

: prediction from combination of

m kernel classifiers

: classifier function

 

6Slide7

Proposed framework

We define the optimal margin classification error for the kernel

with respect to a collection of training examples

as

where

 

7Slide8

Algorithms

Deterministic approach

: all kernels are used

Stochastic approach

: a subset of kernels are used8Deterministic

StochasticDeterministic

StochasticUpdateCombinationSlide9

OMKC(D,D)

9

Training sample

 

 

 

 

Kernel classifiers :

 

Prediction: 

 

 

 

Combined Prediction:

Reduce

if

 

Reduce

if

 

Reduce

if

 

Deterministic update

Deterministic combination

Deterministic

Stochastic

Deterministic

Stochastic

Update

Combination

 

 Slide10

OMKC(S,S)

10

Training sample

 

 

 

 

Kernel classifiers :

 

Prediction: 

 

 

 

Combined Prediction:

Reduce

if

 

 

 

 

Stochastic update

Deterministic

Stochastic

Deterministic

Stochastic

Update

Combination

 

Stochastic combination

 Slide11

Experimental setup

11

binary datasetsSlide12

Experimental setup

15 diverse datasets obtained from LIBSVM and UCI machine learning repository

Predefine 16 kernel functions

3 polynomial kernels (i.e.

)

13 Gaussian kernels (i.e.

)

Fix discount weight

Results are averaged over 20 runs 12Slide13

Evaluation of the deterministic OMKC algorithm

Comparison of the

deterministic

OMKC

algorithm with three Perceptron based algorithmsPerceptron : the well-known Perceptron baseline algorithm with a linear kernel (Rosenblatt 1958; Freund and Schapire 1999)Perceptron(u) : another Perceptron baseline algorithm with an unbiased/uniform combination of all the kernelsPerceptron(*): an online validation procedure to search for the best kernel among the pool of kernels (using the first 10 % training examples), and then apply the Perceptron algorithm with the best kernelOM-2: a state-of-the-art online learning algorithm for multiple kernel learning (Jie et al. 2010; Orabona et al. 2010)13Slide14

Evaluation of the deterministic OMKC algorithm

14

<

>

<Slide15

Average mistake rate (20 runs)

15Slide16

Number of support vectors (20 runs)

16Slide17

Kernel weights

17Slide18

Effect of

 

 

18

 Slide19

Time Efficiency

19

Decreases as size increasesSlide20

Conclusion

All the OMKC algorithms usually perform better than

the regular Perceptron algorithm with an unbiased linear combination of multiple kernels

the Perceptron algorithm with the best kernel found by validation

the state-of-the-art online MKL algorithmThe deterministic combination strategy usually performs betterStochastic updating strategy improves computational efficiency without decreasing the accuracy significantly20Slide21

Questions?

21

How many kernel classifiers were used in the stochastic combination?

How was the number of support vectors determined? Should the support vectors be given in terms of the number of support vectors per kernel classifier? Did support vectors overlap between kernel classifiers?Slide22

References

Hoi, S. C. H., Jin, R., Zhao, P., & Yang, T. (2012). Online Multiple Kernel Classification.

Machine Learning

,

90(2), 289–316. doi:10.1007/s10994-012-5319-222Slide23

Algorithm 1

All kernels are used

:

Represent the classifier at trial

t

: combination of

m

kernel

classifiers 

23

DeterministicStochasticDeterministicStochastic

Update

Combination

Normalize the weights

Update

CombinationSlide24

Algorithm 1 → 2

:

Represent the classifier at trial

t

: combination of

m

kernel

classifiers

 

24Stochastic combination

Deterministic update17:UpdateCombination

DeterministicStochastic

Deterministic

StochasticSlide25

Algorithm 2 → 3

25

Deterministic

Stochastic

Deterministic

Stochastic

Update

CombinationSlide26

Algorithm 2 → 3

26

Deterministic

Stochastic

Deterministic

Stochastic

Deterministic combination

Stochastic update

Guaranteeds that each kernel will be selected with at least probability

Tradeoff between exploration and exploitation (Auer et al. 2003)

 

Update

CombinationSlide27

Algorithm 4

27

Deterministic

Stochastic

Deterministic

Stochastic

Stochastic update

Stochastic combination

Update

Combination