/
Rich feature Hierarchies for Accurate object detection and Rich feature Hierarchies for Accurate object detection and

Rich feature Hierarchies for Accurate object detection and - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
533 views
Uploaded On 2016-05-05

Rich feature Hierarchies for Accurate object detection and - PPT Presentation

Ross Girshick Jeff Donahue Trevor Darrell Jitandra Malik UC Berkeley Presenter Hossein Azizpour Abstract Can CNN improve soa object detection results Yes it helps by learning rich representations which can then be combined with computer vision techniques ID: 306455

svm detection takes visualization detection svm visualization takes cnn negative positive tuning segmentation image regions region learning weight fine

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Rich feature Hierarchies for Accurate ob..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Rich feature Hierarchies for Accurate object detection and semantic segmentation

Ross

Girshick

, Jeff Donahue, Trevor Darrell,

Jitandra

Malik (UC Berkeley)

Presenter:

Hossein

AzizpourSlide2

Abstract

Can CNN improve

s.o.a

. object detection results?

Yes, it helps by learning rich representations which can then be combined with computer vision techniques.

Can we understand what does a CNN learn?

Sort of!, we can check which positive (or negative) image regions stimulates a neuron the most

It will evaluate different layers of the method

Experiments on segmentation

mAP

on VOC 2007:

48%

!Slide3

ApproachSlide4

Region proposals

over segmentation (initial regions)

bottom-up grouping at multiple scales

Diversifications (different region proposals, similarity for grouping,…)

Enables computationally expensive methods

Potentially reduce false positivesSlide5

CNN pre-training

Rectified non-linearity

Local Response Normalization

Overlapping max pooling

5 convolutional layers

2 fully connected layers

Softmax

Drop out

224x224x3 input

ImageNet samplesSlide6

Cnn

fine-tuning

l

ower learning rate (1/100)

only

pascal

image regions

128 patch per image

Positives: overlap >= 0.5, Negative otherwise Slide7

Learning Classifier

Positives: full patches

Negatives: overlap < 0.3 (

very important!

)

Linear SVM per each class

Standard hard negative mining

Pre-computed and saved featuresSlide8

Timing

Training SVM for all classes on a single core takes 1.5 hours

Extracting feature for a window on GPU takes 5

ms

Inference requires a matrix multiplication, for 100K classes it takes 10

secs

Compared to Google Dean et

a

l. paper (CVPR best paper): 16%

mAP in 5 minutes. Here 48% in about 1 minute!Slide9

Detection Results

Pascal 2010

UVA uses the same region proposals with large combined descriptors and HIK SVMSlide10

Visualization

10 million held-out regions

s

ort by the activation response

potentially shows modes and invariances

max

p

ool layer #5 (6x6x256=9216D)Slide11

Visualization

1- Cat (positive SVM weight) 2- Cat (negative SVM weight) 3- Sheep (Positive SVM Weight)

4- Person (positive SVM weight) 5,6- Some generic unit (diagonal bars, red blobs)Slide12

VisualizationSlide13

VisualizationSlide14

VisualizationSlide15

Ablation study

With and without fine tuning on different layers

Pool 5 (only

6%

of all parameters, out of ~60 million

parmeters

)

No Color: (grayscale

pascal input): 43.4%

 40.1% mAPSlide16

Detection Error Analysis

Compared to DPM, more of the FPs come from poor localization

Animals: fine-tuning reduces the confusion with other animals

Vehicles:

fine-tuning reduces the confusion with other

animals

amongst the high scoring FPsSlide17

Detection Error Analysis

Sensitivity is the same, but we see improvements, in general, for all of the subsetsSlide18

Segmentation

CPMC region proposals

SVR

Compared to

s.o.a

. O2P

VOC 2011

3 versions, full, foreground,

full+foregroundFc6 better than fc7

O2P takes 10 hours, CNN takes 1 hourSlide19
Slide20

Learning and Transferring Mid-level image representations using convolutional neural networks

Maxime

Oquab

, Leon

Bottou

, Ivan Laptev, Josef

Sivic

(INRIA, WILLOW)Slide21

Approach

Dense sampling of 500 patches per image instead of segmented regions

Different positive/negative criteria

Resampling positives to make the balance

Classification Slide22

Final REsultsSlide23

Detection PotentialSlide24

Detection PotentialSlide25

Detection Potential