Deep Learning for Dense Geometric Correspondence Problems
2K - views

Deep Learning for Dense Geometric Correspondence Problems

Similar presentations


Download Presentation

Deep Learning for Dense Geometric Correspondence Problems




Download Presentation - The PPT/PDF document "Deep Learning for Dense Geometric Corres..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "Deep Learning for Dense Geometric Correspondence Problems"— Presentation transcript:

Slide1

Deep Learning for Dense Geometric Correspondence Problems

Ke Wang

Slide2

Sparse Correspondence Problems

Slide3

Dense Correspondence Problems

Stereo

Motion

Slide4

Motion vs. Stereo: Differences

Motion:

Uses velocity: consecutive frames must be close to get good approximate time derivative

3d movement between camera and scene not necessarily single 3d rigid transformationStereo

:

Could have any disparity value

View pair separated by a single 3d transformation

Slide5

Two-View Stereo Problem

Input: a

calibrated

binocular stereo pairOutput: produce a dense depth/disparity

image

Left View

Right View

Depth Map

Slide6

Camera Calibration

Image

planes of cameras are parallel to each other and to the baseline

Camera centers are at same height

Focal lengths are the same

Epipolar

lines fall along horizontal scanlines

Slide7

Stereo Image Rectification

Reproject

image planes onto a common plane parallel to the line between optical centersPixel motion is horizontal after this transformation

Two

homographies

(3x3 transform), one for each input image

reprojectionC. Loop and Z. Zhang. Computing Rectifying

Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

Slide8

Depth from Disparity

Disparity is inversely proportional to depth

 

f

x

x’

Baseline

B

z

O

O’

X

f

Slide9

Basic Stereo Matching Algorithm

Epipolar

Line

Slide10

Difficulties: Homogeneous Regions

Slide11

Difficulties: Repetitive Patterns

Slide12

Difficulties: Occlusions

Slide13

Traditional Pipeline

Scharstein

, Daniel, and Richard

Szeliski

. "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms." 

International journal of computer vision

 47.1-3 (2002): 7-42.

Slide14

Matching Cost

SAD

SSD

NCC

Census

Mutual Info

AD-Census

Slide15

Cost Aggregation

Box filtering

Adaptive weighting

Cross-based

Slide16

Disparity Computation

Local

Look at one patch a time

Faster, less accurate

Error-prone for

textureless

region

Global

Look at the whole imageSlower, usually more accurate

Semi-Global

Slide17

Disparity Refinement

Sub-pixel refinement

Median filter

Cross-checking

Occlusion filling

Slide18

Creating Dataset

Positive examples

Slide19

Creating Dataset

Negative examples

Slide20

MC-CNN Architecture

Left patch

(9*9)

Right patch

(9*9)

Conv1:

5*5 * 32

Fc2:

200

Fc3: 200

Fc4: 300

Fc5: 300Concat: 400

Fc6:

300

Fc7:

300

Softmax8:

2

Slide21

MC-CNN Architecture

L1 (Conv1), L2(FC2), L3(FC3) are tied

Negative class prediction score used as raw matching

costReLU after each layer L1-L7No poolingGrayscale patches

Left patch

(9*9)

Right patch

(9*9)

Conv1:

5*5 * 32

Fc2: 200Fc3: 200

Fc4: 300

Fc5: 300

Concat

:

400

Fc6:

300

Fc7:

300

Softmax8:

2

Slide22

Cost Computation

Left patch

(9*9)

Right patch

(9*9)

Conv1:

5*5 * 32

Fc2:

200

Fc3: 200

Fc4: 300

Fc5: 300Concat: 400

Fc6:

300

Fc7:

300

Softmax8:

2

Cost Volume:

 

Slide23

Compute Matching Cost for Image Pair

Full-res Left Image

Full-res Right Image

Conv1:

5*5 * 1

32 kernel

Conv2:

5*5*32

200 kernel

Conv3:1*1*200

200 kernelConv4: 1*1300 kernel

Conv5: 1*1300 kernel

Concat: 400

Conv8+softmax:

1*1

2 kernel

Conv6:

1*1

300 kernel

Conv7:

1*1

300 kernel

One feedforward pass for each distinct disparity value!

Slide24

MC-CNN-fst Architecture

Trained to minimize hinge loss:

 

Dot Product

Conv1:

3*3

64 kernels

ReLU

Conv2:

3*3

64 kernels

ReLU

Conv3:

3*3

64 kernels

ReLU

Conv4:

3*3

64 kernels

Normalize

Conv1:

3*3

64 kernels

ReLU

Conv2:

3*3

64 kernels

ReLU

Conv3:

3*3

64 kernels

ReLU

Conv4:

3*3

64 kernels

Normalize

Left Patch 9*9

Right Patch 9*9

Slide25

MC-CNN-acrt Architecture

Trained to minimize binary cross-entropy loss:

 

Left Patch 9*9

Right Patch 9*9

Concatenate

FC5:

384

+

ReLU

FC6:

384

+

ReLU

FC7:

384

+

ReLU

FC8:

384

+ Sigmoid

Conv1:

3*3

112 kernels

+

ReLU

Conv2:

3*3

112 kernels

+

ReLU

Conv3:

3*3

112 kernels

+

ReLU

Conv4:

3*3

112 kernels

+

ReLU

Conv1:

3*3

112 kernels

+

ReLU

Conv2:

3*3

112 kernels

+

ReLU

Conv3:

3*3

112 kernels

+ ReLUConv4:3*3112 kernels+ ReLU

Slide26

DeepEmbed Architecture

L1-L4: weight for left/right patches are tied.

ReLU

follow each layer.No pooling.

Left Patch 13*13

Right Patch 13*13

Conv1

3*3

32 kernels

Conv23*3

32 kernelsConv35*5200 kernels

Conv45*5200 kernels

Slide27

Architecture: Multi-Scale Ensemble

13*13

7*7

Conv1

3*3

32 kernels

Conv2

3*3

32 kernels

Conv35*5200 kernels

Conv45*5200 kernels

Slide28

Results

Slide29

Results

Test Image

MC-CNN

MC-CNNv2

DeepEmbed

Far

Near

Slide30

Why no pooling?

Pooling gives spatial invariance for recognition problems:

You care if it’s a cat or not, you don’t care where the cat is

For stereo problems: spatial location is important

Slide31

On Dataset Size

Slide32

Is the Dataset Valid? Repetitive Patterns

Slide33

Is the Dataset Valid? Homogeneous Regions

Slide34

End-to-end learning?

FlowNet

: end-to-end system of solving optical flow problems

Slide35

Multi-scale

Ensemble is one way of integrating multi-scale matching. But it’s slow

Multi-scale solutions for classification problems:

MOP-CNNSPP-Net:

Slide36

Optical Flow Problems

Given two subsequent frames, estimate the apparent motion field

and

between them

Key assumptions

Brightness constancy: projection of the same point looks the same in every frame

Small motion: points do not move very far

Spatial coherence: points move like their

neighbors

 

I

(

x

,

y

,

t

–1)

I

(

x

,

y

,

t

)

Slide37

Rendering Image Pairs with Known Motion

Slide38

Architecture: FlowNetSimple

Slide39

Architecture: FlowNetCorrelation

Slide40

Upsampling

Slide41

Results

Slide42

Examples

Slide43