On Semantic Perception Mapping and Exploration SPME Karlsruhe Germany 2013 Semantic Parsing for Priming Object Detection in RGBD Scenes Cesar Cadena and Jana Kosecka Motivation 552013 Longterm robotic operation ID: 574237
Download Presentation The PPT/PDF document "3rd Workshop" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
3rd Workshop
On Semantic Perception, Mapping and Exploration (SPME)Karlsruhe, Germany ,2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Cesar Cadena and Jana KoseckaSlide2
Motivation
5/5/2013Long-term robotic operationThe semantic information about the surrounding environment is important for high level robotic tasks.It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide3
Motivation
5/5/2013Long-term robotic operationThe semantic information about the surrounding environment is important for high level robotic tasks.It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide4
Motivation
5/5/2013Long-term robotic operationThe semantic information about the surrounding environment is important for high level robotic tasks.It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide5
Motivation
5/5/2013Long-term robotic operationThe semantic information about the surrounding environment is important for high level robotic tasks.It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide6
However:
There are things we can assume to be present (almost) alwaysGeneric “detachable” objects also share some characteristicsUrban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects Today: Ground – Structure –
Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide7
However:
There are things we can assume to be present (almost) alwaysGeneric “detachable” objects also share some characteristicsUrban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects Today: Ground – Structure –
Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide8
However:
There are things we can assume to be present (almost) alwaysGeneric “detachable” objects also share some characteristicsUrban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects Today: Ground – Structure –
Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide9
However:
There are things we can assume to be present (almost) alwaysGeneric “detachable” objects also share some characteristicsUrban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects Today: Ground – Structure –
Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide10
However:
There are things we can assume to be present (almost) alwaysGeneric “detachable” objects also share some characteristicsUrban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects Today: Ground – Structure –
Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Our Problem
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide11
However:
There are things we can assume to be present (almost) alwaysGeneric “detachable” objects also share some characteristicsUrban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects Today: Ground – Structure –
Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Our Problem
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide12
NYU Depth v2
5/5/20131449 labeled frames. 26 scenes classes.Labeling spans over 894 different classes.
N.
Silberman
, D.
Hoiem, P. Kohli
, and R. Fergus, Indoor segmentation and support inference from RGBD images
, in ECCV, 2012.
Thanks to N.
Silberman
for proving the mapping 894 to 4 classes.
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide13
The System
5/5/2013Semantic SegmentationMAP
Marginals
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide14
Different approaches
5/5/2013Semantic SegmentationMAP
Marginals
N.
Silberman
et al. ECCV 2012
C.
Couprie et al. CoRR 2013X. Ren et al. CVPR 2012
D. Munoz et al. ECCV 2010
I.
Endres
and D.
Hoeim
, ECCV 2010
Th
ey have at least one:
Expensive over-segmentation
Expensive features
Expensive Inference
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide15
Our approach
5/5/2013MAPMarginals
Semantic Segmentation
Conditional Random Fields
Potentials
Graph Structure
Inference
Preprocessing
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide16
Outline
5/5/2013MAPMarginalsConditional Random FieldsPotentials
Graph Structure
Inference
Preprocessing
(1)
(2)
(3)
(5)
Results
(6)
Conclusions
(4)
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide17
Preprocessing: Over-segmentation
5/5/2013SLIC superpixelsR. Achanta
, A. Shaji
, K. Smith, A. Lucchi
, P. Fua
, and S. Susstrunk
,SLIC superpixels
compared to state-of-the-art superpixel methods,
PAMI, 2012.
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide18
Graph Structure
5/5/2013Classical choice on imagesSemantic Parsing for Priming Object Detection in RGB-D ScenesSlide19
Graph Structure: Our choice
5/5/2013Minimum Spanning TreeOver 3D
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide20
Graph Structure: Our choice
5/5/2013Minimum Spanning TreeOver 3D
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide21
Potentials: Pairwise CRFs
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide22
Potentials:
Pairwise CRFs5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide23
Potentials:
Pairwise CRFs5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide24
Potentials: unary
5/5/2013
frequency of label
j
in a k-NN query
frequency of label
j
the database
J.
Tighe
and S.
Lazebnik
,
Superparsing
: Scalable nonparametric image parsing with
superpixels
,
ECCV 2010.
The database is a
kd
-tree of features from training data
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide25
Features 12D
5/5/2013From Image:mean of Lab color space 3Dvertical pixel location 1Dentropy from vanishing points 1DFrom 3Dheight and depth 2Dmean and std of differences on depth 2Dlocal planarity 1Dneighboring planarity 1Dvertical orientation 1DSemantic Parsing for Priming Object Detection in RGB-D ScenesSlide26
Features
5/5/2013From Image:entropy from vanishing points Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide27
Features
5/5/2013
From 3D
mean and std of differences on depth
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide28
Features
5/5/2013
From 3D
mean and std of differences on depth
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide29
Features
5/5/2013
From 3D
mean and std of differences on depth
local planarity
neighboring planarity
vertical orientation
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide30
Potentials:
pairwise5/5/2013
Lab color
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide31
Inference
5/5/2013We use belief propagation:Exact results in MAP/marginalsEfficient computation, in Thanks to our graph structure choice!Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide32
Results: NYU-D v2 Dataset
5/5/2013
GT MAP
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide33
Results: NYU-D v2 Dataset
5/5/2013Confusion matrix:Comparisons:
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide34
Results: NYU-D v2 Dataset
5/5/2013Confusion matrix:Comparisons:
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide35
Results: NYU-D v2 Dataset
5/5/2013GT MAPSome failures:
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide36
Results: NYU-D v2 Dataset
5/5/2013Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide37
Marginal probabilities
5/5/2013Provide very useful information for specific tasks, e.g. :Specific object detectionSupport inferenceP(Ground)
P(Structure)
P(Furniture)
P(Props)
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide38
Conclusions
5/5/2013We have presented a computational efficient approach for semantic segmentation of priming objects in indoors.Our approach effectively uses 3D and Images cues. Depth discontinuities are evidence for occlusionsThe MST over 3D keeps intra-class components coherently connected.Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide39
Discussion
5/5/2013Features:Local classifier:Graph structureBunch of engineered features (>1000D)Learned features(>1000D)
Select meaningful features
(12D)
Logistic Regression
Neural Networks
k-NN
Dense Connections
Image
None
MST over 3D
Silberman
et al. 2012
Couprie
et al. 2013
Ours.
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide40
Thanks!!
5/5/2013Cesar Cadena ccadenal@gmu.eduJana Kosecka kosecka@.cs.gmu.eduFunded by the US Army Research Office Grant W911NF-1110476.Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide41
Working on:
5/5/2013People detection by Shenghui Zhou
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide42
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide43
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide44
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide45
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide46
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D ScenesSlide47
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes