Evan Herbst Peter Henry Xiaofeng Ren Dieter Fox University of Washington Intel Research Seattle 1 Overview Goal learn about an environment by tracking changes in it over time Detect objects that occur in different places at different times ID: 264678
Download Presentation The PPT/PDF document "Toward Object Discovery and Modeling via..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Toward Object Discovery and Modeling via 3-D Scene Comparison
Evan Herbst, Peter Henry, Xiaofeng Ren, Dieter FoxUniversity of Washington; Intel Research Seattle
1Slide2
Overview
Goal: learn about an environment by tracking changes in it over timeDetect objects that occur in different places at different times2
Handle
textureless
objects
Avoid appearance/shape priors
Represent a map with static + dynamic partsSlide3
Algorithm Outline
Input: two RGB-D videosMapping & reconstruction of each videoInterscene alignmentChange detectionSpatial regularizationOutputs: reconstructed static background; segmented movable objects3Slide4
Scene Reconstruction
Mapping based on RGB-D Mapping [Henry et al. ISER’10]Visual odometry, loop-closure detection, pose-graph optimization, bundle adjustment4Slide5
Scene Reconstruction
Mapping based on RGB-D Mapping [Henry et al. ISER’10]Surface representation: surfels5Slide6
Scene Differencing
Given two scenes, find parts that differSurfaces in two scenes similar iff object doesn’t moveComparison at each surface point6Slide7
Scene Differencing
Given two scenes, find parts that differComparison at each surface pointStart by globally aligning scenes7
(2-D)
(3-D)Slide8
Naïve Scene Differencing
Easy algorithm: closest point within δ → sameIgnores color, surface orientationIgnores occlusions8Slide9
Model probability that a surface point
movedSensor readings zExpected measurement z*m
ϵ {0, 1}Scene Differencing
9
z
*
z
0
z
1
z
2
z
3
frame 0
frame 10
frame 25
frame 49Slide10
Sensor Models
10
Model probability that a surface point
m
oved
Sensor readings
z
;
expected measurement
z
*
By Bayes
,
Two sensor measurement models
With
no expected
surface:
With
expected
surface:Slide11
Sensor Models
Two sensor measurement modelsWith expected surfaceDepth: uniform + exponential + Gaussian 1Color: uniform + GaussianOrientation: uniform + Gaussian11
1
Thrun
et al., Probabilistic Robotics, 2005
z
d
*Slide12
Sensor Models
Two sensor measurement modelsWith expected surfaceDepth: uniform + exponential + Gaussian 1Color: uniform + GaussianOrientation: uniform + GaussianWith no expected surfaceDepth: uniform + exponentialColor: uniformOrientation: uniform
12
1
Thrun
et al.,
Probabilistic Robotics
, 2005
z
d
*Slide13
Example Result
13
Scene 1
Scene 2Slide14
Spatial Regularization
14Points treated
independently so far
MRF to label each
surfel
moved or not moved
Data term given by
pointwise
evidence
Smoothness term: Potts, weighted by curvatureSlide15
Spatial Regularization
15
Points treated
independently so far
MRF to label each
surfel
moved or not moved
Scene 1
Scene 2
pointwise
regularizedSlide16
Experiments
Trained MRF on four scenes (1.4M surfels)Tested on twelve scene pairs (8.0M surfels)70% error reduction wrt max-class baseline16
Count
%
Count
%
Total
surfels
8.0M
100
8.0M
100
Moved
surfels
250k
3250k
3Errors
250k355.5k
0.7
False pos
0
0
4.5k
0.06
False
neg
250k
3
51.0k
0.64
Baseline
OursSlide17
Experiments
Results: complex scene17Slide18
Experiments
Results: large object18Slide19
Next steps
All scenes in one optimization
Model completion from many scenes
Train more supervised object segmentation
Conclusion
Segment movable objects in 3-D using scene changes over time
Represent a map as static + dynamic parts
Extensible sensor model for RGB-D
sensors
19Slide20
Using More Than 2 Scenes
Given our framework, pretty easy to combine evidence from multiple scenes:wscene could be chosen to weight all scenes (rather than frames) equally, or upweight those taken under good lightingOther ways to subsample frames: as in keyframe selection in mapping20Slide21
Color, normal: uniform + Gaussian; mixing controlled by
probability that beam hit expected surfaceFirst Sensor Model: Surface Didn’t MoveModeling sensor measurements:Depth: uniform + exponential + Gaussian *
21
* Fox
et al.,
“Markov Localization…”, JAIR ‘99
z
d
*Slide22
Experiments
Trained MRF on four scenes (2.7 Msurfels)Tested on twelve scene pairs (8.0 Msurfels)250k moved surfels; we get 4.5k FP, 51k FN65% error reduction wrt max-class baselineExtract foreground segments as “objects”
22Slide23
Overview
Many visits to same area over timeFind objects by motion23Slide24
(extra) Related Work
Prob. Sensor modelsDepth onlyDepth & color, extra indep. AssumptionsStatic + dynamic mapsIn 2-dUsually not modeling objs24Slide25
Spatial Regularization
Pointwise only so farMRF to label each surfel moved or not movedData term given by pointwise evidenceSmoothness term: Potts, weighted by curvature
25Slide26
Depth-Dependent Color/Normal Model
Modeling sensor measurements:Combine depth/color/normal:26Slide27
Scene Reconstruction
Mapping based on RGB-D Mapping [Henry et al. ISER’10]Surface representation: surfels27