J Saketha Nath IIT Bombay Collaborators Pratik Jawanpuria Arun Iyer Sunita Sarawagi Ganesh Ramakrishnan Outline Introduction to Representation Learning Summary of Research ID: 604368
Download Presentation The PPT/PDF document "Learning Representations of Data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Learning Representations of Data
J. Saketha Nath
, IIT Bombay
Collaborators:
Pratik
Jawanpuria
,
Arun
Iyer
, Sunita
Sarawagi
, Ganesh Ramakrishnan.Slide2
Outline
Introduction to Representation Learning
Summary of Research
Case Study:
Class-ratio estimation
Concluding remarksSlide3
Introduction toRepresentation LearningSlide4
Representation Learning: Illustration
Training
Inference
Slide5
Representation Learning: Illustration
Training
Inference
Slide6
Representation Learning: Examples
Training
Inference
Principle Component Analysis
Deep Learning
(long list :)Slide7
Representation Learning: Illustration
Training
Inference
Slide8
Kernel Learning: Illustration
Training
Inference
Slide9
Kernel Learning: Broad set-ups
Multi-modal Data
[NIPS’09, JMLR’11]
Multi-task Learning
[SDM’11, ICML’12]
Interpretable Rule Learning
[ICML’11, JMLR’15]Slide10
Case Study:Class Ratio Estimation
Kernel LearningSlide11
Class Ratio Estimation
Labeled
Unlabeled
Slide12
Class Ratio Estimation
Labeled
Unlabeled
What
frac
. from each class?Slide13
Class Ratio Estimation
Slide14
Class Ratio Estimation
Assumption:
Slide15
Class Ratio Estimation
Slide16
Class Ratio Estimation
Representation of data distribution using kernel
Slide17
Class Ratio Estimation
Slide18
Class Ratio Estimation
Kernel Learning: Which
is best?
Slide19
Statistical Consistency
Theorem: Let
be the estimated and true class ratios, let
be a matrix with
column as
, and let
, then with probability
, we have:
Please refer ICML’14, KDD’16 for detailsSlide20
Kernel Learning
Given:
.
Goal: Find
such that
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide21
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide22
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide23
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide24
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide25
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide26
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide27
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Slide28
Kernel Learning
Given:
.
Goal: Find
such that…
min. bound on
mineig
mineig
CONVEX!
CONVEX!
min. empirical average of
Posed as a SDP, solved using cutting-planes algorithm
Please refer ICML’14 for detailsSlide29
Simulation results
Estimation Error
Varying Negative Class Proportions in U
(proportion in L is set to [0.5, 0.5])Slide30
Concluding remarks…Slide31
Summary of ResearchSlide32
Kernel learning
Multi-modal Data
NIPS’09, JMLR’11
INFERENCE
DATA
Multi-task learning
SDM’11, ICML’12
Rule Ensemble Learning
ICML’11, JMLR’15Slide33
Kernel learning
Multi-modal Data
NIPS’09, JMLR’11
INFERENCE
DATA
Multi-task learning
SDM’11, ICML’12
Rule Ensemble Learning
ICML’11, JMLR’15Slide34
Kernel learning
Multi-modal Data
NIPS’09, JMLR’11
INFERENCE
DATA
Multi-task learning
SDM’11, ICML’12
Rule Ensemble Learning
ICML’11, JMLR’15Slide35
Kernel learning
Rule Ensemble Learning
ICML’11, JMLR’15
INFERENCE
DATA
Multi-task learning
SDM’11, ICML’12
Multi-modal Data
NIPS’09, JMLR’11Slide36
Kernel learning – Multi-modal Data
For details refer NIPS’09, JMLR’11
Training, Inference
Slide37
Kernel learning – Multi-modal Data
Training, Inference
For details refer NIPS’09, JMLR’11
Sparse combination of kernels in vogueSlide38
Kernel learning – Multi-modal Data
Training, Inference
For details refer NIPS’09, JMLR’11
Sparse combination of kernels in vogue
Key idea
Hierarchy in kernels
Non-sparse over modes
Sparse within each modeSlide39
Kernel learning – Multi-modal Data
Training, Inference
For details refer NIPS’09, JMLR’11
Sparse combination of kernels in vogue
Key idea
Hierarchy in kernels
Non-sparse over modes
Sparse within each mode
Mirror-descent
algo
.
Iterations involve sparse case solutionSlide40
Kernel learning – Multi-task Case
For details refer SDM’11, ICML’12
DATA
INFERENCE
Slide41
Kernel learning – Multi-task Case
Paradigm shift
Multi-task feature learning to multi-task kernel learning
For details refer SDM’11, ICML’12
DATA
INFERENCE
Slide42
Kernel learning – Multi-task Case
Paradigm shift
Multi-task feature learning to multi-task kernel learning
Generalized to case of unknown task relationships
Few kernels shared by few tasks
Convex formulation and active-set based algorithm
Analysed convergence
For details refer SDM’11, ICML’12
DATA
INFERENCE
Slide43
Rule Ensemble Learning
For details refer ICML’11, JMLR’15Slide44
Rule Ensemble Learning
Posed as kernel learning problem
Convex formulation
Provable bounds
on convergence
For details refer ICML’11, JMLR’15Slide45
Rule Ensemble Learning
Posed as kernel learning problem
Convex formulation
Provable bounds on convergence
Search for compact rules
Rule is long
Descendents
are even longer
For details refer ICML’11, JMLR’15Slide46
Representation Learning
DATA
Training, InferenceSlide47
Overwhelming choice in representations
DATA
Training, Inference
Representation LearningSlide48
Overwhelming choice in representations
Data-dependent representation learning
DATA
Training, Inference
Representation LearningSlide49
Explicit feature learning
Dictionary Learning
Deep Learning
DATA
Training, Inference
Representation Learning - ParadigmsSlide50
Explicit feature learning
Dictionary Learning
Deep Learning
Implicit feature learning
Kernel learning
DATA
Kernel Methods
Representation Learning - Paradigms
Slide51
Explicit feature learning
Dictionary Learning
Deep Learning
Implicit feature learning
Kernel learning
DATA
Kernel Methods
Representation Learning - Paradigms