Steven CH Hoi Rong Jin Peilin Zhao Tianbao Yang Machine Learning 2013 Presented by Audrey Cheong Electrical amp Computer Engineering MATH 6397 Data Mining Background Online ID: 326082
Download Presentation The PPT/PDF document "Online Multiple Kernel Classification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Online Multiple Kernel Classification
Steven C.H. Hoi,
Rong
Jin,
Peilin
Zhao,
Tianbao
Yang
Machine Learning (2013)
Presented by Audrey Cheong
Electrical & Computer Engineering
MATH 6397: Data MiningSlide2
Background - Online
Online learning
Learns one instance at a time and predicts labels for future instances
Learner is given an instance
Learner predicts the label of the instance
Learner is given the correct label
Learner refines its prediction mechanism
2Slide3
Background – Multiple Kernel
Composed of two online learning algorithms:
Perceptron algorithm (Rosenblatt 1958)
Type of linear classifier
Learns a classifier for a given kernelHedge algorithm (Freund and Schapire 1997) Combines classifiers by linear weights: Classifier 2 Perceptron
: Classifier 1
Perceptron: Classifier 3
Perceptron
where
Hedge3Slide4
Perceptron algorithm
Input vector :
Output vector :
;
Weights :
Threshold :
Arithmetic test :Minimize :
4Slide5
Hedge algorithm
Distribute weight
among
classifiersSetting new weights :
for discount weight
if the prediction is incorrect and if correct
5Slide6
Notations
: trial
: mixture of kernel classifiers
: indicates if training instance
is misclassified by the
kernel classifier at trial
t
: indicator function
: prediction from combination of
m kernel classifiers
: classifier function
6Slide7
Proposed framework
We define the optimal margin classification error for the kernel
with respect to a collection of training examples
as
where
7Slide8
Algorithms
Deterministic approach
: all kernels are used
Stochastic approach
: a subset of kernels are used8Deterministic
StochasticDeterministic
StochasticUpdateCombinationSlide9
OMKC(D,D)
9
Training sample
…
Kernel classifiers :
Prediction:
Combined Prediction:
Reduce
if
Reduce
if
Reduce
if
…
Deterministic update
Deterministic combination
Deterministic
Stochastic
Deterministic
Stochastic
Update
Combination
Slide10
OMKC(S,S)
10
Training sample
…
Kernel classifiers :
Prediction:
Combined Prediction:
Reduce
if
…
Stochastic update
Deterministic
Stochastic
Deterministic
Stochastic
Update
Combination
Stochastic combination
Slide11
Experimental setup
11
binary datasetsSlide12
Experimental setup
15 diverse datasets obtained from LIBSVM and UCI machine learning repository
Predefine 16 kernel functions
3 polynomial kernels (i.e.
)
13 Gaussian kernels (i.e.
)
Fix discount weight
Results are averaged over 20 runs 12Slide13
Evaluation of the deterministic OMKC algorithm
Comparison of the
deterministic
OMKC
algorithm with three Perceptron based algorithmsPerceptron : the well-known Perceptron baseline algorithm with a linear kernel (Rosenblatt 1958; Freund and Schapire 1999)Perceptron(u) : another Perceptron baseline algorithm with an unbiased/uniform combination of all the kernelsPerceptron(*): an online validation procedure to search for the best kernel among the pool of kernels (using the first 10 % training examples), and then apply the Perceptron algorithm with the best kernelOM-2: a state-of-the-art online learning algorithm for multiple kernel learning (Jie et al. 2010; Orabona et al. 2010)13Slide14
Evaluation of the deterministic OMKC algorithm
14
<
>
<Slide15
Average mistake rate (20 runs)
15Slide16
Number of support vectors (20 runs)
16Slide17
Kernel weights
17Slide18
Effect of
18
Slide19
Time Efficiency
19
Decreases as size increasesSlide20
Conclusion
All the OMKC algorithms usually perform better than
the regular Perceptron algorithm with an unbiased linear combination of multiple kernels
the Perceptron algorithm with the best kernel found by validation
the state-of-the-art online MKL algorithmThe deterministic combination strategy usually performs betterStochastic updating strategy improves computational efficiency without decreasing the accuracy significantly20Slide21
Questions?
21
How many kernel classifiers were used in the stochastic combination?
How was the number of support vectors determined? Should the support vectors be given in terms of the number of support vectors per kernel classifier? Did support vectors overlap between kernel classifiers?Slide22
References
Hoi, S. C. H., Jin, R., Zhao, P., & Yang, T. (2012). Online Multiple Kernel Classification.
Machine Learning
,
90(2), 289–316. doi:10.1007/s10994-012-5319-222Slide23
Algorithm 1
All kernels are used
:
Represent the classifier at trial
t
: combination of
m
kernel
classifiers
23
DeterministicStochasticDeterministicStochastic
Update
Combination
Normalize the weights
Update
CombinationSlide24
Algorithm 1 → 2
:
Represent the classifier at trial
t
: combination of
m
kernel
classifiers
24Stochastic combination
Deterministic update17:UpdateCombination
DeterministicStochastic
Deterministic
StochasticSlide25
Algorithm 2 → 3
25
Deterministic
Stochastic
Deterministic
Stochastic
Update
CombinationSlide26
Algorithm 2 → 3
26
Deterministic
Stochastic
Deterministic
Stochastic
Deterministic combination
Stochastic update
Guaranteeds that each kernel will be selected with at least probability
Tradeoff between exploration and exploitation (Auer et al. 2003)
Update
CombinationSlide27
Algorithm 4
27
Deterministic
Stochastic
Deterministic
Stochastic
Stochastic update
Stochastic combination
Update
Combination