Network to Compare Image Patches Jure Zbontar Yann LeCun Background Motivation Problem Formulation Methodology Training Data Suggested Net Architectures Sequential Steps Results Conclusion ID: 698197
Download Presentation The PPT/PDF document "Stereo Matching by Training a Convolutio..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Stereo Matching by Training a Convolutional NeuralNetwork to Compare Image Patches
Jure
Zbontar
, Yann
LeCunSlide2
BackgroundMotivation
Problem Formulation
Methodology
Training DataSuggested Net ArchitecturesSequential StepsResultsConclusion
Table of ContentsSlide3
BackgroundMotivation
Problem Formulation
Methodology
Training DataSuggested Net Architectures
Sequential StepsResults
Conclusion
Table of ContentsSlide4
Given input: 2 images (right and left), acquired at different horizontal positions
Required output: The disparity for
each pixel in the left
imageDisparity - difference in horizontal location (x-axis) of an object in the left and right image
Motivation
Stereo
MatchingSlide5
MotivationStereo
MatchingSlide6
Motivation
Stereo MatchingSlide7
Given the disparity ‘ at each pixel, the depth a can be obtained by
- distance between camera centersF - focal lengthApplications in autonomous driving, robotics, 3D scene reconstruction and more…
Motivation
ApplicationsSlide8
Stereo matching steps [Scharstein
& Szeliski -2002]
:
Matching cost computationCost aggregationOptimization
Disparity refinementFocus of this work: Matching cost initialization
Problem Formulation
Stereo
Matching Steps
Matching
cost
initializationSlide9
Matching cost example: - Left and right image centered at
q
- The
set of locations within a fixed rectangular window centered at p
Problem Formulation
Stereo Matching StepsSlide10
Problem Formulation
Goal
Matching cost initialization via convolutional neural networksSlide11
Background
Motivation
Problem Formulation
MethodologyTraining DataSuggested Net ArchitecturesSequential Steps
Results
Conclusion
Table of ContentsSlide12
Data sets: KITTI and Middlebury
For each image position
with known disparity: one
negative and one positive training examplePositive example: the right image patch center is shifted by
where
Negative
example: the
right image
patch center is shifted by
where
Methodology
Training DataSlide13
Example
from
KITTI dataset
:
Example from Middlebury dataset:
Methodology
Training DataSlide14
Data augmentation procedure: Artificial expansion of the data set from existing samples
Tweak – small deviations between parallel image patches
Selected actions:
RotationScalingHorizontal scaling
Methodology
Training Data
Horizontal
shearing
Horizontal transformation
Brightness & contrast adjustmentSlide15
Two suggested architectures: fast versus accurate
Common ground for both architectures:
Siamese networkMethodologySuggested Net ArchitecturesSlide16
Methodology
Fast ArchitectureSlide17
Training cost function – hinge loss - margin
- net output for negative sample
- net output for positive sampleSimilarity of the positive example is greater than the similarity of the negative example by at least the margin.
Methodology
Fast ArchitectureSlide18
Methodology
Accurate ArchitectureSlide19
Training cost function – cross-entropy loss
- sample class
- net outputMethodologyAccurate ArchitectureSlide20
Obtained matching cost
- patches from left and right images
Cross-based
cost aggregation (CCBA) – Local averaging of matching costSemiglobal matching – Disparity map smoothness constraints enforcementDisparity image computation and enhancement
Methodology
Sequential StepsSlide21
The outputs of the two sub-networks need to be computed only once per location, and not for
every disparity under consideration.
The output of the two sub-networks can be computed for all pixels in a single
forward pass by propagating full-resolution images, instead of small image patches.The fully connected layer forms the bottleneck.Methodology
Key insightsSlide22
Background
Motivation
Problem Formulation
MethodologyTraining DataSuggested Net Architectures
Sequential Steps
Results
Conclusion
Table of ContentsSlide23
ResultsSuccess Measure
Number of misclassified pixels
Total number of pixelsSlide24
Results
KITTI2012 DatasetSlide25
Results
KITTI2015 DatasetSlide26
Results
Middlebury DatasetSlide27
Results
Data
AugmetntaionSlide28
Results
RuntimesSlide29
Results
Training Data SizeSlide30
Results
Transfer LearningSlide31
Results
Hyperparameters
Remark
: Patch size is directly determined by the number of convolutional layersSlide32
Results
Visual Examples (KITTI)Slide33
Results
Visual Examples (KITTI)Slide34
ResultsVisual Examples
(Middlebury)Slide35
Background
Motivation
Problem Formulation
MethodologyTraining DataSuggested Net Architectures
Sequential Steps
Results
Conclusion
Table of ContentsSlide36
Two CNN architectures for learning a similarity measure on image patches were presented.
The two architectures were used for stereo matching.A relatively
simple
CNN outperformed all previous methods on the well-studied problem of stereo.Conclusion