/
Object detection, deep learning, and R-CNNs Object detection, deep learning, and R-CNNs

Object detection, deep learning, and R-CNNs - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
445 views
Uploaded On 2018-02-03

Object detection, deep learning, and R-CNNs - PPT Presentation

Ross Girshick Microsoft Research Guest lecture for UW CSE 455 Nov 24 2014 Outline Object detection the task evaluation datasets Convolutional Neural Networks CNNs overview and history Regionbased Convolutional Networks RCNNs ID: 627568

green dpm units convolutional dpm green convolutional units mkl nns input detection image selective search unit hog object pascal

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Object detection, deep learning, and R-C..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Object detection, deep learning, and R-CNNs

Ross Girshick

Microsoft Research

Guest lecture for UW CSE 455

Nov. 24, 2014Slide2

Outline

Object detection

the task, evaluation, datasets

Convolutional Neural Networks (CNNs)

overview and history

Region-based Convolutional Networks (R-CNNs)Slide3

Image classification

classes

Task: assign correct class label to the whole image

 

Digit classification (MNIST)

Object recognition (Caltech-101)Slide4

Classification vs. Detection

Dog

Dog

DogSlide5

Problem formulation

person

motorbike

Input

Desired output

{ airplane, bird, motorbike, person, sofa }Slide6

Evaluating a detector

Test image (previously unseen)Slide7

First detection ...

‘person’ detector predictions

0.9Slide8

Second detection ...

0.9

0.6

‘person’ detector predictionsSlide9

Third detection ...

0.9

0.6

0.2

‘person’ detector predictionsSlide10

Compare to ground truth

ground truth ‘person’ boxes

0.9

0.6

0.2

‘person’ detector predictionsSlide11

Sort by confidence

...

...

...

...

...

0.9

0.8

0.6

0.5

0.2

0.1

true

positive

(high overlap)

false

positive

(no overlap,

low overlap, or

duplicate)

X

X

XSlide12

Evaluation metric

...

...

...

...

...

0.9

0.8

0.6

0.5

0.2

0.1

X

X

X

 

 

 

+

XSlide13

Evaluation metric

Average Precision (

AP

)

0

% is worst

100% is best

mean AP over classes

(mAP)

...

...

...

...

...

0.9

0.8

0.6

0.5

0.2

0.1

X

X

XSlide14

Pedestrians

Histograms of Oriented Gradients for Human Detection

,

Dalal

and

Triggs

, CVPR 2005AP ~77%More sophisticated methods: AP ~90%

(a) average gradient image over training examples

(b) each “pixel” shows max positive SVM weight in the block centered on that pixel(c) same as (b) for negative SVM weights

(d) test image(e) its R-HOG descriptor(f) R-HOG descriptor weighted by positive SVM weights(g) R-HOG descriptor weighted by negative SVM weightsSlide15

Why did it work?

Average gradient imageSlide16

Generic categories

Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …?

PASCAL Visual Object Categories (VOC) datasetSlide17

Generic categories

Why doesn’t this work (as well)?

Can we detect people, chairs, horses, cars, dogs, buses, bottles, sheep …?

PASCAL Visual Object Categories (VOC) datasetSlide18

Quiz timeSlide19

Warm up

This is an average image of which object class?Slide20

Warm up

pedestrianSlide21

A little harder

?Slide22

A little harder

?

Hint: airplane, bicycle, bus, car, cat, chair, cow, dog, dining table

Slide23

A little harder

b

icycle (PASCAL)Slide24

A little harder, yet

?Slide25

A little harder, yet

?

Hint: white blob on a green backgroundSlide26

A little harder, yet

sheep (PASCAL)Slide27

Impossible?

?Slide28

Impossible?

dog (PASCAL)Slide29

Impossible?

dog (PASCAL)

Why does the mean look like this?

There’s no alignment between the examples!

How do we combat this?Slide30

PASCAL VOC detection history

DPM

DPM,

HOG

+

BOW

DPM,

MKL

DPM++

DPM++,

MKL,

Selective

Search

Selective

Search,

DPM++,

MKL

41%

41%

37%

28%

23%

17%Slide31

Part-based models & multiple features (MKL)

DPM

DPM,

HOG

+

BOW

DPM,

MKL

DPM++

DPM++,

MKL,

Selective

Search

Selective

Search,

DPM++,

MKL

41%

41%

37%

28%

23%

17%

rapid performance improvementsSlide32

Kitchen-sink approaches

DPM

DPM,

HOG

+

BOW

DPM,

MKL

DPM++

DPM++,

MKL,

Selective

Search

Selective

Search,

DPM++,

MKL

41%

41%

37%

28%

23%

17%

increasing complexity & plateauSlide33

Region-based Convolutional Networks (R-CNNs)

DPM

DPM,

HOG

+

BOW

DPM,

MKL

DPM++

DPM++,

MKL,

Selective

Search

Selective

Search,

DPM++,

MKL

41%

41%

37%

28%

23%

17%

53

%

62

%

R-CNN v1

R-CNN v2

[R-CNN. Girshick et al. CVPR 2014]Slide34

~1 year

~5 years

Region-based Convolutional Networks (R-CNNs)

[R-CNN. Girshick et al. CVPR 2014]Slide35

Convolutional Neural Networks

OverviewSlide36

Standard Neural Networks

 

 

 

“Fully connected”Slide37

From NNs to Convolutional NNs

Local connectivity

Shared (“tied”) weights

Multiple feature maps

PoolingSlide38

Convolutional NNs

Local connectivity

Each green unit is only connected to (3)

neighboring

blue units

compareSlide39

Convolutional NNs

Shared (“tied”) weights

All green units

share

the same parameters

Each green unit computes the

same function,

but with a

different input window

 

 

 

 

 

 

 Slide40

Convolutional NNs

Convolution with 1-D filter:

 

All green units

share

the same parameters

Each green unit computes the

same function,

but with a

different input window

 

 

 

 Slide41

Convolutional NNs

Convolution with 1-D filter:

 

All green units

share

the same parameters

Each green unit computes the

same function,

but with a

different input window

 

 

 

 Slide42

Convolutional NNs

Convolution with 1-D filter:

 

All green units

share

the same parameters

Each green unit computes the

same function,

but with a

different input window

 

 

 

 Slide43

Convolutional NNs

Convolution with 1-D filter:

 

All green units

share

the same parameters

Each green unit computes the

same function,

but with a

different input window

 

 

 

 Slide44

Convolutional NNs

Convolution with 1-D filter:

 

All green units

share

the same parameters

Each green unit computes the

same function,

but with a

different input window

 

 

 

 Slide45

Convolutional NNs

Multiple feature maps

All orange units compute the

same function

but with a

different input windows

Orange and green units

compute

different functions

 

 

 

 

 

 

Feature map 1

(array of green

units)

Feature map 2

(array of orange

units)Slide46

Convolutional NNs

Pooling (

max

, average)

1

4

0

3

4

3

Pooling area: 2 units

Pooling stride: 2 units

Subsamples

feature mapsSlide47

Image

Pooling

Convolution

2D inputSlide48

1989

Backpropagation

applied to handwritten zip code recognition

,

Lecun

et al., 1989Slide49

Historical perspective – 1980Slide50

Historical perspective – 1980

Hubel and Wiesel

1962

Included basic ingredients of

ConvNets

, but no supervised learning algorithmSlide51

Supervised learning – 1986

Early demonstration that error

backpropagation

can be used

for supervised training of neural nets (including

ConvNets

)

Gradient descent training with error

backpropagationSlide52

Supervised learning – 1986

“T” vs. “C” problem

Simple

ConvNetSlide53

Practical ConvNets

Gradient-Based Learning Applied to Document Recognition

,

Lecun

et al., 1998Slide54

Demo

http

://

cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html

ConvNetJS

by Andrej Karpathy (Ph.D. student at Stanford)

Software librariesCaffe (C++, python, matlab)

Torch7 (C++, lua)Theano (python)Slide55

The fall of ConvNets

The rise of Support Vector Machines (SVMs)

Mathematical advantages (theory, convex optimization)

Competitive performance on tasks such as digit classification

Neural

nets b

ecame unpopular in the mid 1990sSlide56

The key to SVMs

It’s all about the features

Histograms of Oriented Gradients for Human Detection

,

Dalal

and

Triggs, CVPR 2005

SVM weights

(+) (-)

HOG featuresSlide57

Core idea of “deep learning”

Input: the “

raw

” signal (image, waveform, …)

Features: hierarchy of features is

learned from the raw inputSlide58

If SVMs killed neural nets, how did they come back (in computer vision)?Slide59

What’s new since the 1980s?

More layers

LeNet-3 and LeNet-5 had 3 and 5 learnable layers

Current models have 8 – 20+

ReLU” non-

linearities (Rectified Linear Unit)

Gradient doesn’t vanish

“Dropout” regularization

Fast GPU implementations

More data

 

 

 Slide60

Ross’s Own System: Region CNNsSlide61

Competitive ResultsSlide62

Top Regions for Six Object Classes