Performance Evaluation as a Capability in Production Internet Live Streaming Networks Chen Tian Richard Alimi Yang Richard Yang David Zhang Aug 16 2012 Live Streaming is a Major Internet App ID: 588720
Download Presentation The PPT/PDF document "ShadowStream" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
ShadowStream: Performance Evaluation as a Capability in Production Internet Live Streaming Networks
Chen TianRichard AlimiYang Richard YangDavid ZhangAug. 16, 2012Slide2
Live Streaming is a Major Internet AppSlide3
Poor Performance After Updates
Lacking sufficient evaluation before releaseSlide4
Don’t We Already Have …
They are not enough !Slide5
Live Streaming Background
We focus on hybrid live streaming systems: CDN + P2P Slide6
Live Streaming Background
We focus on hybrid live streaming systems: CDN + P2P Slide7
With Connection Limit
Testbed: Misleading Results at Small Scale
Production Default
Small-Scale
Large-Scale
Piece Missing Ratio
3.7%
0.7%
64.8%
3.5%
Live streaming performance
can be highly non-linear.Slide8
Testbed: Misleading Results due to Missing Features
Piece Missing Ratio# Timed-out Requests
# Received Duplicate Packets# Received Outdated Packets
LAN Style
(Same
BW)
1.5%
1404.25
0
5.65
ADSL Style (Same
BW)
7.3%
2548.25
633
154.20
Realistic features can have large performance impacts.Slide9
Testing Channel: Lacking QoE ProtectionSlide10
Testing Channel: Lacking OrchestrationWhat we want is …
What we have is …Slide11
ShadowStream Design GoalSlide12
RoadmapSlide13
Protection: Basic SchemeNote: R denotes Repair, E denotes ExperimentSlide14
Example Illustration: E SuccessSlide15
Example
Illustration: E SuccessSlide16
Example
Illustration: E SuccessSlide17
Example Illustration: E FailSlide18
Example
Illustration: E FailSlide19
Example
Illustration: E FailSlide20
Example
Illustration: E FailSlide21
How to Repair?
Choice 1: dedicated CDN resources (R=rCDN)Benefit: simpleLimitations
requires resource reservation, e.g., 100,000 clients x 1 Mbps = 100 Gbpsmay not work well when there is network bottleneckSlide22
How to Repair?Choice 2: production machine (R=production)Benefit 1: Larger resource poolBenefit 2: Fine-tuned algorithms
Benefit 3: A unified approach to protection & orchestration (later)Slide23
R= Production: Resource CompetitionSlide24
R
= Production: Misleading Result
missing
ratio
x+y
=
θ
0
accurate
result
repair demand
misleading
resultSlide25
Putting Together: PCESlide26
Putting Together: PCESlide27
Implementing PCESlide28
Implementing PCE: base observationA simple partitioned sliding window to partition downloading tasks among PCE automatically
unavailable
unavailable
piece missing
responsibility transferredSlide29
Client ComponentsSlide30
RoadmapSlide31
Orchestration ChallengesHow to start an Experiment streaming machineTransparent to real viewersHow to control the arrival/departure of each Experiment machine in a scalable waySlide32
Transparent Orchestration IdeaSlide33
Transparent Orchestration IdeaSlide34
Transparent Orchestration IdeaSlide35
Distributed Activation of TestingOrchestrator distributes parameters to clientsEach client independently generates its arrival time according to the same
distribution function F(t)Together they achieve global arrival patternCox and Lewis TheoremSlide36
Orchestrator ComponentsSlide37
RoadmapSlide38
Software ImplementationCompositional RuntimeModular design, including scheduler, dynamic loading of blocks, etc.3400 lines of codePre-packaged
blocksHTTP integration, UDP sockets and debugging500 lines of codeLive streaming machine4200 lines of codeSlide39
Experimental OpportunitiesSlide40
Protection and Accuracy
Virtual PlaypointReal Playpoint
Buggy8.73%N/A
R=
rCDN
8.72%
0%
R=
rCDN w/ bottleneck8.81%5.42%
Piece Missing RatioSlide41
Protection and AccuracyVirtual
PlaypointReal Playpoint
PCE bottleneck9.13%0.15%
PCE w/ higher bottleneck
8.85%
0%
Piece Missing RatioSlide42
Orchestration: Distributed ActivationSlide43
Utility on Top: Deterministic Replay
Log Size 100 clients; 650 seconds223KB
300 clients; 1,800 seconds714KBSlide44
RoadmapSlide45
ContributionsDesign and implementation of a novel live streaming network that introduces performance evaluation as an intrinsic capability in production networksScalable (PCE) protection of QoE despite large-scale Experiment failuresTransparent orchestration for flexible testingSlide46
Future WorkLarge-scale deployment and evaluationApply the Shadow (Experiment->Validation->Repair) scheme to other applicationsExtend the Shadow (Experiment->Validation->Repair
) schemeE.g., repair does not mean do the same job as Experiment, as long as it masks visible failuresSlide47
Adaptive Rate Streaming Repair
AccuracyProtected QoE
Protection Overhead
Follow
1.26x
1.59x
1.49 Kbps
Base
1.26x
1.42x3.69 Kbps
Adaptive
1.26x
1.58x
1.39 KbpsSlide48
Thank you!Slide49
Questions?Slide50
backupSlide51
Poor Performance After Updates
Lacking sufficient evaluation before releaseSlide52
Related WorkDebugging and evaluation of distributed systemse.g., ODR, Friday, DieCastBased on a key
observationAllows scenarios customizationFlowVisorAllocate a fixed portion of tasks and resourcesSlide53
Why Not Testing Channel: orchestrationWhat we want is …What we have is …Slide54
Experiment Specification & TriggeringA testing should define:One or more classes of clientsClient-wide arrival rate functionsClient-wide life duration functionTriggering
Condition: prediction basedSlide55
Experiment TransitionConnectivity TransitionPlaybuffer State TransitionMore details in the paper:
Replace Early Departed Clients, Independent Departure ControlSlide56
ShadowStream Design GoalProduction networks
By adding protection and orchestration into production networks, we have ….
Live Testing !
TestbedsSlide57
State of Art: Hybrid SystemsSlide58
Putting Together : ShadowStreamThe first system, in the context of live streaming, that can perform live testing with both protection and orchestrationDesign the Repair system that can
simultaneously provide protection and experiment accuracyFully implemented and evaluatedSlide59
Problem: Resource CompetitionRepair and Experiment compete on key resource (client upload bandwidth)Competition may lead to systematic underestimation on Experiment performance
How to get around
?Slide60
Experiment Orchestrationlist Experiment Specification & TriggeringIndependent Arrivals ControlExperiment TransitionReplace Early
Departed ClientsIndependent Departure ControlSlide61
Example IllustrationSlide62
From Idea to SystemSlide63
Extended WorksDynamic StreamingDeterministic ReplaySlide64
Example Illustration XX