Tags :
relu kernels
300 patch
kernels
relu
patch
300
conv1
image
stereo
left
problems
architecture
200
disparity
motion
112
kernel

Download Presentation

Download Presentation - The PPT/PDF document "Deep Learning for Dense Geometric Corres..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Deep Learning for Dense Geometric Correspondence Problems

Ke Wang

Slide2Sparse Correspondence Problems

Slide3Dense Correspondence Problems

Stereo

Motion

Slide4Motion vs. Stereo: Differences

Motion:

Uses velocity: consecutive frames must be close to get good approximate time derivative

3d movement between camera and scene not necessarily single 3d rigid transformationStereo

:

Could have any disparity value

View pair separated by a single 3d transformation

Slide5Two-View Stereo Problem

Input: a

calibrated

binocular stereo pairOutput: produce a dense depth/disparity

image

Left View

Right View

Depth Map

Slide6Camera Calibration

Image

planes of cameras are parallel to each other and to the baseline

Camera centers are at same height

Focal lengths are the same

Epipolar

lines fall along horizontal scanlines

Slide7Stereo Image Rectification

Reproject

image planes onto a common plane parallel to the line between optical centersPixel motion is horizontal after this transformation

Two

homographies

(3x3 transform), one for each input image

reprojectionC. Loop and Z. Zhang. Computing Rectifying

Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.

Slide8Depth from Disparity

Disparity is inversely proportional to depth

f

x

x’

Baseline

B

z

O

O’

X

f

Slide9Basic Stereo Matching Algorithm

Epipolar

Line

Slide10Difficulties: Homogeneous Regions

Slide11Difficulties: Repetitive Patterns

Slide12Difficulties: Occlusions

Slide13Traditional Pipeline

Scharstein

, Daniel, and Richard

Szeliski

. "A taxonomy and evaluation of dense two-frame stereo correspondence algorithms."

International journal of computer vision

47.1-3 (2002): 7-42.

Slide14Matching Cost

SAD

SSD

NCC

Census

Mutual Info

AD-Census

Slide15Cost Aggregation

Box filtering

Adaptive weighting

Cross-based

Slide16Disparity Computation

Local

Look at one patch a time

Faster, less accurate

Error-prone for

textureless

region

Global

Look at the whole imageSlower, usually more accurate

Semi-Global

Slide17Disparity Refinement

Sub-pixel refinement

Median filter

Cross-checking

Occlusion filling

Slide18Creating Dataset

Positive examples

Slide19Creating Dataset

Negative examples

Slide20MC-CNN Architecture

Left patch

(9*9)

Right patch

(9*9)

Conv1:

5*5 * 32

Fc2:

200

Fc3: 200

Fc4: 300

Fc5: 300Concat: 400

Fc6:

300

Fc7:

300

Softmax8:

2

Slide21MC-CNN Architecture

L1 (Conv1), L2(FC2), L3(FC3) are tied

Negative class prediction score used as raw matching

costReLU after each layer L1-L7No poolingGrayscale patches

Left patch

(9*9)

Right patch

(9*9)

Conv1:

5*5 * 32

Fc2: 200Fc3: 200

Fc4: 300

Fc5: 300

Concat

:

400

Fc6:

300

Fc7:

300

Softmax8:

2

Slide22Cost Computation

Left patch

(9*9)

Right patch

(9*9)

Conv1:

5*5 * 32

Fc2:

200

Fc3: 200

Fc4: 300

Fc5: 300Concat: 400

Fc6:

300

Fc7:

300

Softmax8:

2

Cost Volume:

Slide23

Compute Matching Cost for Image Pair

Full-res Left Image

Full-res Right Image

Conv1:

5*5 * 1

32 kernel

Conv2:

5*5*32

200 kernel

Conv3:1*1*200

200 kernelConv4: 1*1300 kernel

Conv5: 1*1300 kernel

Concat: 400

Conv8+softmax:

1*1

2 kernel

Conv6:

1*1

300 kernel

Conv7:

1*1

300 kernel

One feedforward pass for each distinct disparity value!

Slide24MC-CNN-fst Architecture

Trained to minimize hinge loss:

Dot Product

Conv1:

3*3

64 kernels

ReLU

Conv2:

3*3

64 kernels

ReLU

Conv3:

3*3

64 kernels

ReLU

Conv4:

3*3

64 kernels

Normalize

Conv1:

3*3

64 kernels

ReLU

Conv2:

3*3

64 kernels

ReLU

Conv3:

3*3

64 kernels

ReLU

Conv4:

3*3

64 kernels

Normalize

Left Patch 9*9

Right Patch 9*9

Slide25MC-CNN-acrt Architecture

Trained to minimize binary cross-entropy loss:

Left Patch 9*9

Right Patch 9*9

Concatenate

FC5:

384

+

ReLU

FC6:

384

+

ReLU

FC7:

384

+

ReLU

FC8:

384

+ Sigmoid

Conv1:

3*3

112 kernels

+

ReLU

Conv2:

3*3

112 kernels

+

ReLU

Conv3:

3*3

112 kernels

+

ReLU

Conv4:

3*3

112 kernels

+

ReLU

Conv1:

3*3

112 kernels

+

ReLU

Conv2:

3*3

112 kernels

+

ReLU

Conv3:

3*3

112 kernels

+ ReLUConv4:3*3112 kernels+ ReLU

Slide26DeepEmbed Architecture

L1-L4: weight for left/right patches are tied.

ReLU

follow each layer.No pooling.

Left Patch 13*13

Right Patch 13*13

Conv1

3*3

32 kernels

Conv23*3

32 kernelsConv35*5200 kernels

Conv45*5200 kernels

Slide27Architecture: Multi-Scale Ensemble

13*13

7*7

Conv1

3*3

32 kernels

Conv2

3*3

32 kernels

Conv35*5200 kernels

Conv45*5200 kernels

Slide28Results

Slide29Results

Test Image

MC-CNN

MC-CNNv2

DeepEmbed

Far

Near

Slide30Why no pooling?

Pooling gives spatial invariance for recognition problems:

You care if it’s a cat or not, you don’t care where the cat is

For stereo problems: spatial location is important

Slide31On Dataset Size

Slide32Is the Dataset Valid? Repetitive Patterns

Slide33Is the Dataset Valid? Homogeneous Regions

Slide34End-to-end learning?

FlowNet

: end-to-end system of solving optical flow problems

Slide35Multi-scale

Ensemble is one way of integrating multi-scale matching. But it’s slow

Multi-scale solutions for classification problems:

MOP-CNNSPP-Net:

Slide36Optical Flow Problems

Given two subsequent frames, estimate the apparent motion field

and

between them

Key assumptions

Brightness constancy: projection of the same point looks the same in every frame

Small motion: points do not move very far

Spatial coherence: points move like their

neighbors

I

(

x

,

y

,

t

–1)

I

(

x

,

y

,

t

)

Slide37Rendering Image Pairs with Known Motion

Slide38Architecture: FlowNetSimple

Slide39Architecture: FlowNetCorrelation

Slide40Upsampling

Slide41Results

Slide42Examples

Slide43© 2020 docslides.com Inc.

All rights reserved.