Mostajabi Yadollahpour and Shakhnarovich Toyota Technological Institute at Chicago Main Ideas Casting semantic segmentation as classifying a set of superpixels Extracting CNN features from different levels of spatial context around the ID: 674929
Download Presentation The PPT/PDF document "Feedforward semantic segmentation with z..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Feedforward semantic segmentation with zoom-out features
Mostajabi
,
Yadollahpour
and
Shakhnarovich
Toyota
Technological Institute at ChicagoSlide2
Main Ideas
Casting semantic segmentation as classifying a set of
superpixels
.Extracting CNN features from different levels of spatial context around the superpixel at hand.Using MLP as the classifier
2
Photo credit:
Mostajabi
et al.Slide3
Zoom-out feature extraction
3
Photo credit:
Mostajabi
et al.Slide4
Zoom-out feature extraction
Subscene
Level Features
Bounding box of superpixels within radius three from the superpixel at handWarp bounding box to
256 x 256 pixelsActivations of the last fully connected layer
Scene Level FeaturesWarp image to 256 x 256 pixelsActivations of the last fully connected
layer
4Slide5
Training
Extracting the features from the mirror images and take element-wise max over the resulting two features vectors.
12416-dimensional representation for each
superpixel.Training 2 classifiersLinear classifier (Softmax)MLP: Hidden layer (1024 neurons) +
ReLU + Hidden layer (1024 neurons) with dropout
5Slide6
Loss Function
Imbalanced dataset
Wheighted
loss functionLoss function:Let
be frequency of class c in the training data and
6Slide7
Effect of Zoom-out Levels
7
Photo and Table credit:
Mostajabi et al.
Image
Ground
Truth
G1:3
G1:5
G1:5+S1
G1:5+S1+S2Slide8
Quantitative Results
Softmax
Results on VOC 2012
Table credit: Mostajabi et al.
8Slide9
Quantitative Results
MLP Results
Table credit:
Mostajabi et al.
9Slide10
Qualitative Results
10
Photo credit:
Mostajabi et al.Slide11
Learning Deconvolution Network for Semantic Segmentation
Noh, Hong and Han
POSTECH
, Korea11Slide12
Motivations
12
Photo credit: Noh et al.
Image
Ground Truth
FCN PredictionSlide13
Motivations
13
Photo credit: Noh et al.Slide14
Deconvolution Network Architecture
14
Photo credit: Noh et al.Slide15
Unpooling
15
Photo credit: Noh et al.Slide16
Deconvolution
16
Photo credit: Noh et al.Slide17
Unpooling and Deconvolution Effects
17
Photo credit: Noh et al.Slide18
Pipeline
Generating 2K object proposals using Edge-Box and selecting top 50 based on their
objectness
scores.Aggregating the segmentation maps which are generated for each proposals using pixel-wise maximum or average.Constructing the class conditional probability map using SoftmaxApply fully-
conncected CRF to the probability map.Ensemble with FCNComputing mean of probability map generated with
DeconvNet and FCNapplying CRF.
18
Photo credit: Noh et al.Slide19
Training Deep Network
Adding a batch normalization layer to the output of every convolutional and
deconvolutional
layer.Two-stage TrainingTrain on easy examples first and then fine-tune with more challenging ones.Constructing easy examples:Crop object instances using ground-truth annotationsLimiting the variations in object location and size reduces the search space for semantic
segmentation substantially
19Slide20
Effect of Number of Proposals
20
Photo credit: Noh et al.Slide21
Quantitative Results
21
Table credit: Noh et al.Slide22
Qualitative Results
22
Photo credit: Noh et al.Slide23
Qualitative Results
Examples that FCN produces better results than
DeconvNet
.23
Photo credit: Noh et al.Slide24
Qualitative Results
Examples that inaccurate predictions from our method and FCN are improved by ensemble
.24
Photo credit: Noh et al.