Pensieve Hongzi Mao Ravi Netravali Mohammad Alizadeh https gigaomcom 20121109onlineviewersstartleavingifvideodoesntplayin2secondssaysstudy Video La Luna Pixar 2011 ID: 625237
Download Presentation The PPT/PDF document "Neural Adaptive Video Streaming with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Neural Adaptive Video Streaming with Pensieve
Hongzi Mao Ravi Netravali Mohammad AlizadehSlide2
https://
gigaom.com/2012/11/09/online-viewers-start-leaving-if-video-doesnt-play-in-2-seconds-says-study/
Video: La Luna (Pixar 2011)
Users
start leaving if video doesn’t play in
2
seconds
1Slide3
Video Client
Video
Server
Request
:
next video chunk at bitrate
r
Response
:
video content
Input
Output
1 sec/sec
Animation borrowed from
Te
-Yuan Huang (SIGCOMM ‘
14)
http://
conferences.sigcomm.org
/
sigcomm
/2014/doc/slides/38.pdf
2
bitrate
Adaptive Bitrate (ABR)
Algorithms
Dynamic Streaming over HTTP (DASH)
1 sec
video
content
bitrateSlide4
Why is ABR Challenging?
ThroughputVideo bitrate
Network throughput
is
variable & uncertain
Conflicting
QoE
goals
Bitrate
Rebuffering
time
Smoothness
Cascading effects of decisions
Throughput
Bitrate
(Mbps)
3
Buffer size
(sec)Slide5
4
buffer
ABR agent
bitrates
240P
480P
720P
1080P
network and
video measurements
bandwidth
b
it rate
720P
First
network
control system
using
modern
“deep” reinforcement learning
Delivers
12-25% better
QoE
, with 10-30% less
rebuffering
than previous
ABR algorithms
Tailors ABR decisions
for different network
conditions in a data-driven way
Our Contribution:
Pensieve
Pensieve
learns
ABR algorithm
automatically
through experienceSlide6
Rate-based: pick bitrate based on predicted throughputFESTIVE [CoNEXT’12], PANDA [JSAC’14], CS2P
[SIGCOMM’16]Buffer-based: pick bitrate based on buffer occupancy BBA [SIGCOMM’14], BOLA [INFOCOM’16]
Hybrid: use both throughput prediction & buffer occupancyPBA [HotMobile’15], MPC
[SIGCOMM’15]
S
implified inaccurate model leads to suboptimal performance
5
Previous Fixed ABR AlgorithmsSlide7
Example: Model Predictive Control
Throughput
Video bitrate
t + T
maximize
QoE
(t, t + T)
subject to
system dynamics
t
Problem:
Needs accurate throughput model
Conservative Throughput
Prediction
6
Throughput
Bitrate
(Mbps)
Buffer size
(sec)
Solution: learn from video streaming sessions
in actual network conditionsSlide8
Reinforcement LearningGoal: maximize the cumulative reward
Agent
Environment
Observe state
Take action
Reward
7Slide9
Action
Pensieve
D
esign
State
Environment
Reward
r
t
+ (bitrate) - (
rebuffering
) - (smoothness)
720P
240P
360P
720P
1080P
Action
a
t
Reward
AgentSlide10
9
How to Train the ABR Agent
ABR agent
state
Neural Network
240P
480P
720P
1080P
policy
π
θ
(
s, a
)
Take action
a
n
ext bitrate
Observe state
s
parameter
θ
e
stimate
from
empirical data
Training
:
Collect experience data
: trajectory of [state, action, reward]Slide11
10What Pensieve is good at
Learn the dynamics directly from experienceOptimize the high level QoE objective end-to-end
Extract control rules from raw high-dimensional signalsSlide12
Pensieve Training System
{state, action, reward}experiencesupdated neural network parameters
11
Video playback
Fast chunk-level simulator
Pensieve
worker
Pensieve
worker
Pensieve
worker
Pensieve
master
Model update
TensorFlow
Large corpus of
network
traces
cellular, broadband, syntheticSlide13
12
PensieveMPC
Demo
Rebuffering
c
hances of outage
Pensieve
buffer (sec)
MPC
buffer (sec)
Throughput (mbps)Slide14
Trace-driven Evaluation
Dataset:
Two datasets, each dataset consists of 1000
traces, each
trace 320 seconds.
Video:
193 seconds. encoded at bitrates:
{300, 750, 1200, 1850, 2850, 4300}
kbps.
V
ideo player:
Google
Chrome
browser
Video server:
Apache server
Norway 3G
c
ellular dataset
FCC broadband dataset
better
better
Pensieve
improves
the best previous scheme by
12-25%and is within 9-14% of the offline optimal
13Slide15
QoE Breakdown
Reward/QoE
+ Bitrate utility
–
rebuffering
penalty – smooth penalty
better
b
etter
b
etter
Pensieve
reduces
rebuffering
by
10-32% over second best algorithm
14Slide16
15
Does Pensieve Generalize?
3G network trace
Trace generated from a Hidden Markov model
Covers a wide range of average throughput and network variation
Synthetic trace Slide17
Does Pensieve Generalize?
16
Train on
synthetic traces
then test on
real 3G network trace
Only 5% degradation compared with
Pensieve trained on real network trace
betterSlide18
17
Other Evaluations Experiments in the wild (LTE, public WiFi, international link)Controlled experiment for testing optimalityMulti-video extensionSensitivity analysisSlide19
1. Build a fast experimentation/simulation platform2. Data diversity is more important than “accuracy”3. Think carefully about controller state space (observation signals)
Too large a state space ⟶ slow & difficult learningToo small a state space ⟶ loss of information⟶ When in doubt, include rather than cut the signal
18
Lessons We
L
earned
Pensieve
agent
Coarse-grain chunk simulatorSlide20
Pensieve uses Reinforcement Learning to generate ABR algorithmsPensieve optimizes different network conditions through experience
Pensieve outperforms existing approaches across a wide range of network environments and QoE preferencesPolicies generated by Pensieve have strong ability to generalize19
Summary
http://web.mit.edu/pensieve
/