Xueying Bai Jiankun Xu Multilabel Image Classification Cooccurrence dependency Higherorder correlation one label can be predicted using the previous label Semantic redundancy labels have overlapping meanings cat and kitten ID: 671047
Download Presentation The PPT/PDF document "CNN-RNN: A Unified Framework for Multi-..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CNN-RNN: A Unified Framework for Multi-label Image Classification
Xueying
Bai,
Jiankun
XuSlide2
Multi-label Image Classification
Co-occurrence dependency
Higher-order correlation: one label can be predicted using the previous label
Semantic redundancy: labels have overlapping meanings (cat and kitten)Slide3
Previous Models
Multiple single-label classification
Fail to model the dependency between multiple labels
Graphic model
Large amount of parameters;
Can not model higher-order correlationSlide4
RNN-CNN Model
Learn the semantic redundancy and the co-occurrence dependencies
Have an end-to-end training process
Predict more objects that need contexts (higher-order correlation)Slide5
CNN-RNN FrameworkSlide6
Joint Embedding Model
Label embedding: the embedding vector in a low-d Euclidian space in which embeddings of semantically similar labels are close to each other
Image embedding: the embedding vector close to that of its associated labels in the same space
Exploit semantic redundancy problem: share classification parametersSlide7
Model Diagram
Output of CNN: Image embedding
Output of RNN (
o(t)
): new embedding including the information from previous label (to model higher order correlations)Slide8
LSTMSlide9
Recurrent Neural NetworkSlide10
Inference
Prediction Path
Beam Search
Find top N labels in each time step as candidates
Find top N prediction paths for each time (t+1)Slide11
Beam Search
When comes to ‘End’: add to the candidate path set
Termination condition: probability of current intermediate paths is smaller than that of all candidate paths.Slide12
Experiments
CNN module uses the 16 layers VGG network
Dimension of label embedding is 64
Dimension of LSTM RNN layer is 512Test on Datasets:
NUS-WIDE, MS COCO and VOC PASCAL 2007Slide13
Evaluation
Metric
Precision:
correctly annotated labels/ generated labelsRecall: correctly
annotated labels/ ground-truth
labels
C-P, O-P; C-R, O-R
C-
Fl
, O-
Fl
: geometrical average
MAPSlide14
NUS-WIDE
A web image dataset contains 269648 images and 5018 tags.
Test on dataset with 1000 tags and 81 tags.Slide15Slide16Slide17
MS COCO
It contains 123 thousand images of 80 objects types.
Training data
has 82783 images and testing data has 40504 images.
Most images have multiple objects.Slide18Slide19Slide20
PASCAL VOC 2007
Training data has
5011
images and testing data has 4952 images.Use AP and
mAP
to evaluate.Slide21
Label embedding
The model effectively learns a joint label embeddingSlide22Slide23
Attention VisualizationSlide24
Conclusion and Future Work
Combines the advantages of the joint image/label embedding and label co-occurrence models by employing CNN and RNN
Experimental results on several datasets show good performance
Predicting small objects is still a challenge.Slide25
Reference: CNN-RNN: A Unified Framework for Multi-label Image Classification —
Jiang Wang, Yi Yang,
Junhua
Mao, Zhiheng Huang, Chang Huang, Wei Xu
Questions?Slide26
Thank you all
!