Frame Synthesis via Cross Convolutional Networks Tianfan Xue Jiajun Wu Katie Bouman Bill Freeman Indicates equal contribution Frame 1 Frame 2 Task future frame prediction Frame 1 ID: 558522
Download Presentation The PPT/PDF document "Visual Dynamics: Probabilistic Future" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Visual Dynamics: Probabilistic Future Frame Synthesisvia Cross Convolutional Networks
Tianfan Xue*
Jiajun Wu*
Katie Bouman
Bill Freeman
* Indicates equal contributionSlide2
Frame 1
Frame 2
?
Task: future frame
predictionSlide3
Frame 1
Frame 2
Deterministic
neural network
Deterministic predictions fail to model uncertaintySlide4
Frame 1
Deterministic
neural network
Deterministic
predictions fail
to model uncertainty
Prediction
RealitySlide5
…
…
Warp
Unrealistic Motion
Realistic Motion
Sampling a motion field from a prior distribution
Warp
Warp
Only a few motion fields are consistent with the input imageSlide6
Related work
Deterministic prediction
?
Sample from prior distribution
Motion prediction
:
[
Pintea
et al., 2014
], [
Walker et al. 2015
]
Visual feature prediction
:
[
Vondrick
et al., 2014
]
Future frame synthesis
: [
Mathieu
et al., 2014
]
Image prior
: [
Simoncelli
2001
], [Zoran
2012]
Motion prior
: [Weiss & Adelson, 1998], [Fleet 2000]
Image synthesis
: [
Portilla
and
Simoncelli
, 2000], [
Kingma
and Welling, 2014],
[
Radford
2015], [Oord
2016
]
P
robabilistic prediction
:
[
Walker
et al., 2016]Slide7
Related work
Deterministic prediction
?
Sample from prior distribution
…
Sampled future frames
Input frame
Our approachSlide8
…
Sampled future frames
Input frame
Task:
sample future
frames consistent with the input
Main idea
Network structureOutlineWhat the network learnsResultSlide9
?
Input frame
Sampled
future frame
Segment-based synthesis
Main idea
Network structure
OutlineWhat the network learnsResultSlide10
Input frame
Sampled
future frame
Segments
Transformed segments
Segment-based synthesis
Main idea
Network structure
OutlineWhat the network learnsResultSlide11
Input frame
Another sampled
future frame
Segments
Transformed segments
Input random
m
otion vector
Synthesize using
different transformations
Main idea
Network structure
Outline
What the network learns
ResultSlide12
Input random motion vector
Synthesis network
Input frame
Sampled future frame
Synthesis network
Main idea
Network structure
Outline
What the network learns
ResultSlide13
Synthesis network
Input frame
Sampled future frame
Sample different future
frames
Main idea
Network structure
Outline
What the network learns
Result
Input random motion vector
Slide14
Synthesis network
Input frame
Sample different future
frames
Main idea
Network structure
Outline
What the network learns
Result
Input random motion vector
Sampled future frameSlide15
Synthesis network
Input frame
Sampled future frame
Sample different future
frames
Main idea
Network structure
Outline
What the network learns
Result
Input random motion vector
Slide16
Sampled future frame
Motion vector
Synthesis network
Input frame
Encoding network
Future frame (ground truth)
Training
Main idea
Network structure
Outline
What the network learns
ResultSlide17
Motion vector
Encoding network
Synthesis network
Future frame
(prediction)
Training samples
(Label-free)
Training
Input frame
Future frame
(ground truth)
Main idea
Network structure
Outline
What the network learns
ResultSlide18
Future frame
(prediction)
Motion vector
Encoding network
Synthesis network
Training
Future frame
(ground truth)
Input frame
Objective function:
Reconstruction loss
Main idea
Network structure
Outline
What the network learns
ResultSlide19
Future frame
(prediction)
Future frame
(ground truth)
Input frame
Encoding network
Synthesis network
Training
Objective function:
KL-divergence loss
M
otion vector
Main idea
Network structure
Outline
What the network learns
Result
Variational
Autoencoder
[
Kingma
and Welling, 2014]Slide20
Future frame
(prediction)
Synthesis network
Testing
Future frame
(ground truth)
Encoding network
Input frame
Input frame
Main idea
Network structure
Outline
What the network learns
Result
u
Input random motion vector
Real output from our networkSlide21
Input random motion vector
Synthesis network
Input random
motion vector
Future frame
Synthesis network
How do we design the synthesis network?
Main idea
Network structure
Outline
What the network learns
Result
Input frameSlide22
Input random
motion vector
Input frame
Future frame
Synthesis network
How do we design the synthesis network?
Main idea
Network structure
Outline
What the network learns
ResultSlide23
Input random
motion vector
Input frame
Future frame
Synthesis network
Find segments
Transform segments
Synthesize by transforming segments
Main idea
Network structure
Outline
What the network learns
ResultSlide24
Input random
motion vector
Input frame
Future frame
Find segments
Transform segments
Synthesize by transforming segments
Main idea
Network structure
Outline
What the network learns
Result
Image segmentsSlide25
Input frame
Future frame
Transform segments
Find segments
Input random
motion vector
Synthesize by transforming segments
Main idea
Network structure
Outline
What the network learns
Result
Image segments
ConvolutionSlide26
Movement can be synthesized through convolution
Main idea
Network structure
Outline
What the network learns
ResultSlide27
0
0
0
0
1
0
000
001000000
Movement can be synthesized through convolutionMain idea
Network structure
Outline
What the network learns
ResultSlide28
Input random
motion vector
Input frame
Future frame
Convolution
Transform segments
Find segments
Transforming segments vis Cross-convolution
Main idea
Network structure
Outline
What the network learns
Result
Motion kernels
Image segmentsSlide29
Input random
motion vector
Input frame
Future frame
Convolution
Transform segments
Find segments
Applying motion to each segment
Main idea
Network structure
Outline
What the network learns
Result
Motion kernels
for segment 1
Segment 1Slide30
Input random
motion vector
Input frame
Future frame
Convolution
Transform segments
Find segments
Applying motion to each segment
Main idea
Network structure
Outline
What the network learns
Result
Motion kernels
for segment 2
Segment 2Slide31
Input random
motion vector
Input frame
Future frame
Convolution
Transform segments
Find segments
Applying motion to each segment
Main idea
Network structure
Outline
What the network learns
Result
Motion kernels
for segment 3
Segment 3
Decoding netSlide32
Image segments
Applying motion to each segment
Main idea
Network structure
Outline
What the network learns
Result
Motion kernels
The decoding network generates a motion kernel for each corresponding segment
Decoding net
Motion
vector
[
Brabandere
et al.
2016]
[Finn et al. 2016]Slide33
Synthesis
network
Motion
vector
Input frame
Future frame
Image segments
Motion kernels
Convolution
Future frame
Transform segments
Find segments
Encoding network
Future frame
Synthesize by transforming segments
Main idea
Network structure
Outline
What the network learns
Result
Decoding netSlide34
Motion
vector
Input frame
Future frame
Synthesis
network
Future frame
Main idea
Network structure
Outline
What the network learns
Result
What is encoded in the motion vector?
Encoding networkSlide35
Motion
vector
Input frame
Future frame
Synthesis
network
Future frame
Main idea
Network structure
Outline
What the network learns
Result
What is encoded in the motion vector?
Encoding networkSlide36
Motion vector
Upward motion when changing this dimension
Main idea
Network structure
Outline
What the network learns
Result
Each dimension encodes a type of motionSlide37
Motion vector
Leg
motion when changing this dimension
Each dimension encodes a type of motion
Main idea
Network structure
Outline
What the network learnsResultSlide38
Simulated shapes
Training samples
Results: toy example
Main idea
Network structure
Outline
What the network learns
ResultSlide39
Input
Learnedsegments
Network automatically detects segments
Triangles
Circles
Main idea
Network structure
OutlineWhat the network learns
ResultSlide40
Input
Sampled next frame
Ground truth
distribution
Sample
distribution
Network learns the correlation between appearance and motion
Main idea
Network structureOutlineWhat the network learnsResultSlide41
Input
Sampled future frames
Results: real-world images
Main idea
Network structure
Outline
What network learns
ResultSlide42
Challenge: large motion
Main idea
Network structure
Outline
What the network learns
Result
Input
Two sampled future framesArtifacts appear when motion is largeSlide43
Baseline: Transfer flow
25.5 %
Our method
31.3 %
Labeled
as
realMechanical Turk study to assess synthesis qualityIdeal synthesis algorithm achieves 50%
Main ideaNetwork structureOutlineWhat the network learnsResultSlide44
Sample multiple future frames that are consistent with the input
Synthesize frames by transforming segments
L
earn a motion representation
without supervision
…
ContributionsSlide45
http://visualdynamics.csail.mit.edu
Tianfan Xue*
Jiajun Wu*
Katie Bouman
Bill Freeman
Visual Dynamics: Probabilistic Future Frame Synthesis
via Cross Convolutional Networks