KuanChuan Peng Tsuhan Chen 1 Introduction Breakthrough progress in object classification 2 O Russakovsky et al ImageNet large scale visual recognition challenge arXiv14090575 2014 ID: 318079
Download Presentation The PPT/PDF document "A Framework of Extracting Multi-scale Fe..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Framework of Extracting Multi-scale Features Using Multiple Convolutional Neural Networks
Kuan-Chuan
PengTsuhan Chen
1Slide2
Introduction
Breakthrough progress in object classification.
2O. Russakovsky
et al.
ImageNet
large scale visual recognition challenge. arXiv:1409.0575, 2014.N. Murray et al. AVA: A Large-Scale Database for Aesthetic Visual Analysis. CVPR12.
cat
dog
lion
tigerSlide3
Introduction
Humans are interested in more than objects.
For example, aesthetic quality.3
N.
Murray et al.
AVA: A Large-Scale Database for Aesthetic Visual Analysis.
CVPR12
.Slide4
How do machines describe images?
Examples by state-of-art algorithm:
A.
Karpathy
and F.-F. Li.
Deep visual-semantic alignments for generating image descriptions.
CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
“man in black shirt is playing guitar.”
“woman is holding bunch of bananas.”
4Slide5
How do machines describe images?
Examples by state-of-art algorithm:
A.
Karpathy
and F.-F. Li.
Deep visual-semantic alignments for generating image descriptions.
CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
“
man in black shirt
is playing guitar.”
“
woman
is holding bunch of bananas.”
5Slide6
How do machines describe images?
Examples by state-of-art algorithm:
A.
Karpathy
and F.-F. Li.
Deep visual-semantic alignments for generating image descriptions.
CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
“man in black shirt is playing
guitar
.”
“woman is holding
bunch of bananas
.”
6Slide7
How do machines describe images?
Examples by state-of-art algorithm:
A.
Karpathy
and F.-F. Li.
Deep visual-semantic alignments for generating image descriptions.
CVPR15.
http://cs.stanford.edu/people/karpathy/deepimagesent/
“man in black shirt is
playing
guitar.”
“woman is
holding
bunch of bananas.”
7Slide8
How do experts describe images?
Examples by the Pulitzer Prize winners:
http://
www.pulitzer.org/archives/8417
http://www.pulitzer.org/archives/6451
“At bath times, Danielle appears serene. But no one know what lies beyond those
eyes.” (by Lane
DeGregory
)
“The surgery has dragged on for hours with little progress, and
Mulliken
, taking a breather next to an array of Sam's CAT scans, is feeling the frustration and
exhaustion.” (by Tom Hallman Jr.)
8Slide9
How do experts describe images?
Images convey more than
objects.
http://www.pulitzer.org/archives/8417
http://www.pulitzer.org/archives/6451
“At bath times, Danielle appears
serene
. But no one know what lies beyond those
eyes.” (by Lane
DeGregory
)
“The surgery has dragged on for hours with little progress, and
Mulliken
, taking a breather next to an array of Sam's CAT scans, is feeling the
frustration
and exhaustion.” (by Tom Hallman Jr.)
9Slide10
Beyond Objects
Abstract attributes matter.Attributes relating to or involving general ideas or qualities rather than specific people, objects, or actions
. [Merriam-Webster dictionary]Bridge the gap between machines and humans:Teach machines to solve
abstract tasks
(tasks involving abstract attributes).
http://www.merriam-webster.com/dictionary/abstract
10Slide11
Goal
A general framework to achieve better performance in abstract tasks.
Multi-scale features by using convolutional neural networks (CNN).
11Slide12
Why CNN?
12
O. Russakovsky et al.
ImageNet
large scale visual recognition challenge
.
arXiv:1409.0575, 2014.L. Deng et al. A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. ICASSP13.
A. Karpathy et al.
Large-scale video classification with
convolutional neural networks. CVPR14.
object classification
video classification
s
peech recognitionSlide13
Existing Abstract Tasks
More and more abstract tasks are proposed.
13Slide14
Artistic Style & Artist Style Classification
[F. S. Khan et al. MVA14.]
Architectural Style Classification
[Z. Xu et al. ECCV14.]
14Slide15
15
amusement
anger
awe
contentment
disgust
excitement
fear
sad
Emotion Classification
[J. Machajdik et al. ACMMM10.]
Aesthetic Classification
[N. Murray et al. CVPR12.]
h
igh aesthetic quality
l
ow aesthetic qualitySlide16
Bohemian
Hipster
Fashion Style Classification
[M. H. Kiapour et al. ECCV14.]
Memorability Prediction
[P.
Isola
et al. CVPR11.]
Interestingness Prediction
[M.
Gygli
et al. ICCV13
.]
16Slide17
Inspiration
It is tricky to describe abstract attributes as objects.Not easy to “locate” abstract attributes.What if abstract attributes prevail everywhere?
Label-inheritable (LI) property.
contentment
[J. Machajdik et al. ACMMM10.]
?
17Slide18
Label-Inheritable (LI) Property
Dataset
Painting-91 [1]
arcDataset
[2]
Caltech-101 [3]
Task
Artist style classification
Architectural style classification
Object classification
LabelPicassoBaroque ArchitectureFaces
Label-inheritableYes
PartialMostly No
18
[1] F. S. Khan
e
t al
.
Painting-91: a large scale database for computational painting categorization.
Machine Vision & Applications 14.
[
2
]
Z. Xu et al
.
Architectural style classification using multinomial latent logistic regression.
ECCV14
.
[3]
F.-F.
Li et al.
Learning
generative visual
models from few training examples: An
incremental
bayesian
approach tested on 101 object
categories.
CVPRW04.Slide19
Label-Inheritable (LI) Property
Dataset
Painting-91 [1]
arcDataset
[2]
Caltech-101 [3]
Task
Artist style classification
Architectural style classification
Object classification
LabelPicassoBaroque ArchitectureFaces
Label-inheritableYes
PartialMostly No
19
[1] F. S. Khan
e
t al
.
Painting-91: a large scale database for computational painting categorization.
Machine Vision & Applications 14.
[
2
]
Z. Xu et al
.
Architectural style classification using multinomial latent logistic regression.
ECCV14
.
[3]
F.-F.
Li et al.
Learning
generative visual
models from few training examples: An
incremental
bayesian
approach tested on 101 object
categories.
CVPRW04.Slide20
Label-Inheritable (LI) Property
Dataset
Painting-91 [1]
arcDataset
[2]
Caltech-101 [3]
Task
Artist style classification
Architectural style classification
Object classification
LabelPicassoBaroque ArchitectureFaces
Label-inheritableYes
PartialMostly No
20
[1] F. S. Khan
e
t al
.
Painting-91: a large scale database for computational painting categorization.
Machine Vision & Applications 14.
[
2
]
Z. Xu et al
.
Architectural style classification using multinomial latent logistic regression.
ECCV14
.
[3]
F.-F.
Li et al.
Learning
generative visual
models from few training examples: An
incremental
bayesian
approach tested on 101 object
categories.
CVPRW04.Slide21
Multi-Scale CNN
Assume LI property holds for each image and the associated label.
21A.
Krizhevsky
et al
.
ImageNet
classification with deep convolutional neural networks. NIPS12.Slide22
AlexNet
The number of nodes in output layer is changed to be the number of classes in each task.
22
A.
Krizhevsky
et al
.
ImageNet
classification with deep convolutional
neural networks.
NIPS12.Slide23
Experimental Results
Method \ Task
Artist style classification
Artistic style classification
Caltech-101 object classification
(15 / 30 training examples per class)
Architectural style classification
(10 / 25 classes
)
Previous work
(baseline)53.10 [1]
62.20 [1]83.80 / 86.50 [2]
69.17 / 46.21 [3]Single-scale CNN(baseline)
55.15
67.37
83.45 / 88.1970.64 / 54.842-scale CNN(ours)
58.1169.6780.19 / 87.58
74.82 / 58.893-scale CNN
(ours)57.91
70.96
N/A75.32 / 59.13
[1] F. S. Khan
e
t al
.
Painting-91: a large scale database for computational painting categorization.
Machine Vision & Applications 14.
[2]
M. D. Zeiler and R. Fergus
.
Visualizing and understanding convolutional networks.
ECCV14.
[3]
Z. Xu et al
.
Architectural style classification using multinomial latent logistic regression.
ECCV14
.
classification accuracy (%)
23
Label-inheritable
Yes
Yes
Mostly No
PartialSlide24
Is it because of more training data?
What if we train one CNN with images in different scales?
24
A.
Krizhevsky
et al
.
ImageNet classification with deep convolutional neural networks. NIPS12.Slide25
Additional Results
Method \ Task
Artist style classification
Artistic style classification
Caltech-101 object classification
(15 / 30 training examples per class)
Architectural style classification
(10 / 25 classes
)
Previous work
(baseline)53.10 [1]
62.20 [1]83.80 / 86.50 [2]
69.17 / 46.21 [3]Single-scale CNN(baseline)
55.15
67.37
83.45 / 88.1970.64 / 54.842-scale CNN(ours)
58.1169.6780.19 / 87.58
74.82 / 58.891 CNN +
2-scale images46.86
61.95N / A
67.93 / 49.06
[1] F. S. Khan
e
t al
.
Painting-91: a large scale database for computational painting categorization.
Machine Vision & Applications 14.
[2]
M. D. Zeiler and R. Fergus
.
Visualizing and understanding convolutional networks.
ECCV14.
[3]
Z. Xu et al
.
Architectural style classification using multinomial latent logistic regression.
ECCV14
.
classification accuracy (%)
25
Label-inheritable
Yes
Yes
Mostly No
PartialSlide26
Conclusion
We proposed Multi-Scale Convolutional Neural Networks (MSCNN) based on Label-Inheritable (LI) property.Multi-scale features.
MSCNN can outperform the state-of-art performance on datasets where LI property holds or even partially holds.
26Slide27
Towards Solving Abstract Tasks
More CNN features
to achieve better performance in abstract tasks.Multi-scale features (ICME15).Multi-depth features (ICIP15).
Multi-task features (submitted to ICCV15).
27
K
.-C. Peng and T. Chen.
A Framework of extracting multi-scale features using multiple convolutional neural networks
.
ICME15.K.-C. Peng and T. Chen. Cross-layer features in convolutional neural networks for generic classification tasks.
ICIP15.K.-C. Peng and T. Chen. Toward correlating and solving abstract tasks using c
onvolutional neural networks. Submitted to ICCV15.Slide28
Q & A
28