Clustering Crowdsourced Videos by LineofSight Puneet Jain Justin Manweiler Arup Acharya and Kirk Beaty Clustered by shared subject c hallenges CAN IMAGE PROCESSING SOLVE THIS PROBLEM ID: 402149
Download Presentation The PPT/PDF document "FOCUS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
FOCUS
: Clustering Crowdsourced Videos by Line-of-Sight
Puneet Jain, Justin Manweiler, Arup Acharya, and Kirk BeatySlide2
Clustered by shared subjectSlide3
challengesSlide4
CAN IMAGE PROCESSING SOLVE THIS PROBLEM?Slide5
Camera 2
Camera 4
Camera 3
Camera 1
5
LOGICAL
similarity
does not imply
VISUAL
similaritySlide6
6
VISUAL
similarity does not imply
LOGICAL similaritySlide7
CAN SMARTPHONE SENSING SOLVE THIS PROBLEM?Slide8
Sensors are noisy
, hard to distinguish subjects…
Why not
triangulate
?Slide9
GPS-COMPASS Line-of-SightSlide10
INSIGHTSlide11
Don’t need to visually identify actual
SUBJECT, can use background as PROXY
hard to identify
easy to identify
Simplifying
I
nsight 1Slide12
same basic structure persists
Simplifying
Insight 2
Don’t need to directly match videos, can compare all to a predefined visual
MODELSlide13
Simplifying
I
nsight
3
Light-of-sight
(triangulation) is
almost
enough, just not via sensing (alone)Slide14
FOCUS
Fast Optical Clustering of live User StreamsSlide15
Hadoop
/HDFS
Failover, elasticity
Image processing
Computer vision
Video Streams
(Android,
iOS
, etc.)
Clustered Videos
FOCUS Cloud
Video Analytics
Video
Extraction
Watching Live
home: 2
away: 1
Users Select & Watch
Organized Streams
Change Angle
Change
FocusSlide16
Clustered Videos
FOCUS Cloud
Video Analytics
Video
Extraction
Watching Live
home: 2
away: 1
Users Select & Watch
Organized Streams
Change Angle
Change
Focus
pre-defined reference “model”
Hadoop
/HDFS
Failover, elasticity
Image processing
Computer visionSlide17
17
Model construction technique based on
Photo Tourism: Exploring image collections in
3D
Snavely
et al., SIGGRAPH 2006
z
multi-view reconstruction
z
keypoint
extraction
estimates camera
POSE
and content in
field-of-view
Multi-view Stereo ReconstructionSlide18
Visualizing Camera PoseSlide19
~ 1 second at 90
th
%~ 18 seconds at 90th
%
19
z
multi-view reconstruction
z
keypoint
extraction
z
frame-by-frame
video to model
alignment
z
sensory inputs
Given a pre-defined 3D, align incoming video frames to the model
Also known as camera pose estimationSlide20
z
multi-view reconstruction
z
keypoint
extraction
z
integration of sensory inputs
Gyroscope
, provides “diff”
f
rom vision initial position
20
0
1
2
3
4
t
- 1
t
- 2
Filesize
≈ 1/Blur
Sampled Frame
GyroscopeSlide21
21
Field-of-view
Using
POSE
+ model
POINT CLOUD
, FOCUS geometrically identifies the set of model points in background of view
z
multi-view reconstruction
z
keypoint
extraction
z
pairwise model image analysisSlide22
1
3
2
Similarity between
image 1 & 2 = 18
Similarity between
image 1 & 3 = 13
22
Finding the similarity across videos
as size of
point cloud set intersection
z
multi-view reconstruction
z
keypoint
extraction
z
pairwise model image analysisSlide23
Clustering “similar” videos
Similarity Score
1
3
3
2
2
1
Application of Modularity Maximization
high
modularity implies:
high correlation
among the members of a cluster
minor correlation
with the members of other clustersSlide24
resultsSlide25
Collegiate Football Stadium
Stadium 33K seats56K maximum attendanceModel: 190K points 412 images (2896 x 1944 resolution)Android Appon Samsung Galaxy Nexus, S3325 videos captured 15-30 seconds each
25Slide26
26
Line-of-Sight Accuracy (visual)Slide27
Line-of-Sight Accuracy
GPS/Compass LOS estimation is
<260 meters for the same percentage
27
In >80% of the cases, Line-of-sight estimation is off by < 40 metersSlide28
FOCUS Performance
75% true positives
Trigger GPS/Compass failover techniques
28Slide29
Natural QuestionsWhat if 3D
model is not available?Online model generation from first few uploadsStadiums look very different on a game day?Rigid structures in the background persistsWhere it won’t work?Natural or dynamic environment are hardSlide30
ConclusionComputer vision and image processing
are often computation hungry, restricting real-time deploymentMobile Sensing is a powerful metadata, can often reduce computation burdenComputer vision + Mobile Sensing + Geometry, along with right set of BigData tools, can enable many real-time applicationsFOCUS, displays one such fusion, a ripe area for further researchSlide31
Thank You
http://cs.duke.edu/~puneet