/
GRASS GRASS

GRASS - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
393 views
Uploaded On 2017-04-29

GRASS - PPT Presentation

Trimming Stragglers in Approximation Analytics Ganesh Ananthanarayanan Michael Hung Xiaoqi Ren Ion Stoica Adam Wierman Minlan Yu Next Generation of Analytics Timely results even ID: 542728

deadline ras slot accuracy ras deadline accuracy slot time error bound grass tasks scheduling task jobs straggler switch job optimal improve speculation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "GRASS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

GRASS: Trimming Stragglers in Approximation Analytics

Ganesh Ananthanarayanan, Michael Hung, Xiaoqi Ren, Ion Stoica, Adam Wierman, Minlan YuSlide2

Next Generation of Analytics

Timely results, even if approximate

Data deluge makes this necessarySlide3

Optimal Scheduler

Approximation Dimensions

Error: Minimize time to get desired accuracy “#cars sold to the nearest thousand”Deadline: Maximize accuracy within deadline“Pick the best ad to display within 2s”

*w.r.t. state-of-the-art schedulers (production workloads from Facebook and Bing)Improve accuracy by 48%Speedup by 40%Slide4

Prioritize tasksSubset

of tasks to complete#tasks » #slots

(multi-waved jobs)(NP-Hard but many known heuristics…)Straggler tasksSlowest task can be 8x slower than median taskSpeculation: Spawn a duplicate, earliest winsGoogle[OSDI’04], FB[OSDI’08], Microsoft[OSDI’10]Scheduling ChallengeSlide5

Challenge:

dynamically prioritize between speculative & unscheduled tasks

to meet deadline/error boundSlide6

Speculative copies consume extra resources

T3

Opportunity CostT2timeSlot 1Slot 2Slot 3T150

109Is speculation worth the payoff?T1Slide7

Roadmap

Two natural scheduling designsGRASS: Combining

the two designsEvaluation of GRASSSlide8

Greedy Scheduling (GS

)Greedily improve accuracy, i.e., earliest finishing task

T1T2T3timeSlot 1Slot 210T4

T5T66T7Task IDT1T2T3T4T5T6T7T8T9Time remaining5------------------

------New copy2---1111113T1Deadline = 6(at time =1 )Accuracy = 7/9StragglerSlide9

R

esource Aware Scheduling (

RAS)Speculate only if it saves time and resourcestimeT1T2T1Slot 1Slot 210

T6T3T4T536T7T8

Task IDT1T2T3T4T5T6T7T8T9Time remaining5------------------------New copy2---1111113T1Deadline = 6(at time =1 )Accuracy = 8/9One copy for 5s (vs.) Two copies for 2s StragglerSlide10

GS vs. RAS

T1

T2T3timeSlot 1Slot 2103

T4T5T66T7Deadline = 6Accuracy = 7/9timeT1

T2T1Slot 1Slot 2T6T3T4T5T7T8Deadline = 6Accuracy = 8/9Deadline = 3Deadline = 3Accuracy = 3/9

T1

Accuracy = 2/9

GS

RAS1036Neither GS nor RAS is uniformly betterSlide11

Intuition:

Use RAS early in the job (be “conservative”), switch to GS

towards the end (be “aggressive”)Slide12

Theoretical Scheduling Model

Multi-waved scheduling of tasksConstant wave-widthAgnostic to fairness policiesHeavy-tailed (Pareto) distribution

of task durationsSpeculation: GS, RAS, Switching, OptimalTheorem:Using RAS when >2 waves of tasks remain, and GS when ≤2 waves of tasks remainis “near-optimal”Slide13

How to estimate two remaining

waves?Wave boundaries are not strictNon-uniform task durations

Wave-width is not constantStart with RAS and switch to GS close to the deadline/error-boundSlide14

GS

RAS

RASGSRASLearning the switching pointGS-only and RAS-only job samples“Exploration vs. Exploitation”Multi-armed bandit solution, ɛ = 0.164RAS[4s]+GS[2s]

RAS[5s]+GS[1s]RAS[6s]SwitchDeadline5Slide15

GRASS (= GS +

RAS) SchedulerOpportunity Cost

in speculation for stragglersGS  Greedy SchedulingRAS  Resource Aware SchedulingSwitch RASGS close to deadline/error-boundLearn switching point empirically from job samplesProvably near-optimal in theoretical modelSlide16

Implementation

Hadoop 0.20.2 and Spark 0.7.3Modified Fair SchedulerJob bins with GS-only and RAS

-only samplesTask EstimatorsRemaining time is extrapolated from data-to-processprogress reports at 5% intervalsNew copy’s time is sampled from completed tasksSlide17

How well does GRASS perform?

Workload from Facebook and Bing tracesHadoop and Dryad production jobsAdded deadlines and error bounds

Baselines: LATE & Mantri200 node EC2 deployment (m2.2xlarge instances)Slide18

Accuracy of deadline-bound jobs improve by

47%

Gains hold across deadlines (lenient and stringent )Slide19

GRASS

is 22% better than statically picking GS or RAS… and is near-optimalSlide20

Error-bound JobsOverall speedup of

38% (optimal is 40%)Gains hold across all error boundsExact jobs (0% error-bound) speed up by

34% Unified Straggler MitigationSlide21

Conclusion

Next gen. of analytics: Approximate but timely results

Challenge: Dynamic and unpredictable stragglersGRASS – Conservative speculation early in the job; aggressive towards its endEvaluation with Hadoop & SparkAccuracy of deadline-bound jobs improve by 47%Error-bound jobs speed up by 38%