Sung Ju Hwang 1 Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin 2 University of Southern California Problem Sharing features between sub superclasses A single visual instance can have multiple labels at different levels of semantic granularity ID: 356188
Download Presentation The PPT/PDF document "Sharing Features Between Visual Tasks at..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Sharing Features Between Visual Tasks at Different Levels of Granularity
Sung
Ju
Hwang1, Fei Sha2 and Kristen Grauman11 University of Texas at Austin, 2 University of Southern California
Problem
Sharing features between sub/
superclasses
A single visual instance can have multiple labels at different levels of semantic granularity..
Main Idea
We propose to simultaneously learn
shared features
that are discriminative for tasks at different levels of semantic granularity
Baselines
Dataset
Sharing features between objects/attributes
Example object class / attribute predictions
1) No sharing : Baseline SVM classifier for the object class recognition
2) Sharing-Same level only : Sharing features between object classifiers at the same level
3)
Sharing+Superclass
: Subclass classifiers trained with features shared with superclasses*4) Sharing+Subclass : Superclass classifiers trained with features shared with subclasses**We use the algorithm for kernel classifiers.
1) Finer grained categorization tasks benefit from sharing features with their superclasses.→ A subclass classifier learns features specific to its superclass, so that it can discriminate better between itself and the classes that do NOT belong to the same superclass.2) Coarser grained categorization tasks do not benefit from sharing features with their subclasses→ Features learned for subclasses are just intra-class variances that introduce confusion.
Recognition accuracy
Predicted Object
NSO
NSA
Ours
Dolphin
Walrus
Grizzly bear
NSO
NSA
Ours
Grizzly bear
Rhinoceros
Moose
NSO
NSA
Ours
Giant Panda
Rabbit
Rhinoceros
Fast, active, toughskin, chewteeth, forest, ocean, swims
NSA
Ours
Fast, active, toughskin, fish, forest, meatteeth, strong
Strong, inactive, vegetation, quadrapedal, slow, walks, big
NSA
Ours
Strong, toughskin, slow, walks, vegetation, quadrapedal, inactive
Quadrapedal, oldworld, walks, ground, furry, gray, chewteeth
NSA
Ours
Quadrapedal, oldworld, ground, walks, tail, gray, furry
White, Spots, Long leg
Polar bear
Motivation: By regularizing to use the shared features, we aim to select features that are associated with semantic concepts at each levelavoid overfitting when object-labeled data is lacking
1)
Our method is more robust to background clutter, as it has a more refined set of features from
sparsity
regularization with attributes.
2) Our method makes robust
predictions in atypical cases.
3) When our method fails, it often makes more semantically “close” predictions.
Visual
instance
Dalmatian
C
anine
Spots
Object class
Dalmatian
Attributes
→
How can we learn new information from these extra labels, that can aid object recognition?
u
2
u
1
u
3
u
D
x
2
x
1
x
3
x
D
Shared features
Input visual features
Superclass
Dog, Canine, Placental mammal
Dataset
#
images
# classes
# Attributes
Hierarchy
level
Animals
with Attributes
30,475
50(40)
28
2
Recognition accuracy for each class
Overall recognition accuracy
We make improvement on 33 classes out of 50 AWA classes, and on all classes of OSR.
Classes with more distinct attributes benefit more from feature sharing: e.g. Dalmatian, leopard, giraffe, zebra
1) No sharing-Obj. : Baseline SVM classifier for object class recognition
2) No sharing-
Attr
. :
Baseline object recognition on predicted attributes as in
Lampert’s
approach
3) Sharing-Obj. : Our multitask feature sharing with the object class classifiers only
4) Sharing-
Attr
.
: Our multitask feature sharing method with object class + attribute classifiers
Baselines
Predicted Attributes
Red: incorrect prediction
Conclusion / Future Work
By sharing features between classifiers learned at different levels of granularity, we improve object class recognition rates. The exploited semantics effectively regularize the object models
.
Future
work
1) Automatic
selection of useful attributes/superclass grouping.
2
)
Leveraging the label structure to refine degree of sharing.
Algorithm
Extension to Kernel classifiers
1) Formulate kernel matrix K
2
) Compute the basis vector B and diagonal matrix S using Gram-Schmidt process
3) Transform the data according to the learned B and S
Learning shared features for linear classifiers
1) Initialize covariance matrix
Ω
with a scaled identity matrix I/D
2) Transform the variables using the covariance matrix
Ω
: transformed n-
th
feature vector
: transformed classifier weights
3) Solve for the optimal weights , while holding
Ω
fixed.
4) Update the covariance matrix
Ω
Alternate until W converges
: weight vectors
: smoothing parameter
(for numerical stability)
Variable updates
4) Apply the algorithm for the linear classifiers on the transformed features
Initialization
Independent
classifier
training
Feature
learning
We adopt the alternating optimization algorithm from [Argyriou08] that can train classifiers and learn the shared features at each step.
Learning shared features via regularization
Sharing features via
sparsity
regularization
Trace norm regularization.
Multitask feature learning
Object classifier
Attribute
classifier
SVM loss function on the original feature space
(2,1)-norm
L2-norm:
Joint data fitting
L1-norm:
sparsity
sparsity
general
Training set specific
: n-
th
feature vector
: n-
th
label for task t
: parameter (weight vector) for task t
Multitask Feature Learning:
Sparsity
regularization on the parameters across different tasks results in shared features with better generalization power.
: regularization parameter
: Orthogonal transformation to a shared feature space
Covariance matrix.
[Argyriou08] M.
Argyriou
, T.
Evgeniou
, Convex Multi-task Feature Learning, Machine Learning, 2008
Transformed (Shared) features
Sparsity
regularizer
loss function
Convex optimization
However, the (2,1)-norm is
nonsmooth
. We instead solve an equivalent form, in which the features are replaced with a covariance matrix that measures the relative effectiveness of each dimension.
How can we promote common
sparsity
across different parameters?
→ We
use
a (2,1)-norm
regularizer
that minimizes L1 norm across tasks [Argyriou08].
Our approach makes substantial improvements over the baselines. Exploiting the external semantics the auxiliary attribute tasks provide, our learned features generalize
better---
particularly when less training data is available.