/
Reining in the Outliers in MapReduce Reining in the Outliers in MapReduce

Reining in the Outliers in MapReduce - PowerPoint Presentation

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
368 views
Uploaded On 2018-10-24

Reining in the Outliers in MapReduce - PPT Presentation

Jobs using Mantri Ganesh Ananthanarayanan Srikanth Kandula Albert Greenberg Ion Stoica Yi Lu Bikas Saha Ed Harris UC Berkeley Microsoft 1 MapReduce Jobs ID: 695550

aware task data recomputes task aware recomputes data tasks network phase outliers outlier reduce workload speculative jobs time recompute

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Reining in the Outliers in MapReduce" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Reining in the Outliers in MapReduce

Jobs using Mantri

Ganesh Ananthanarayanan†, Srikanth Kandula*, Albert Greenberg*, Ion Stoica†, Yi Lu*, Bikas Saha*, Ed Harris* † UC Berkeley * Microsoft

1Slide2

MapReduce Jobs

Basis of analytics in modern Internet servicesE.g., Dryad, HadoopJob  {Phase}  {Task}

Graph flow consists of pipelines as well as strict blocks2Slide3

Example Dryad Job Graph

EXTRACT

AGGREGATE_PARTITION

FULL_AGGREGATE

PROCESS

COMBINE

PROCESS

Distr. File System

Distr. File

System

Phase

Pipeline

Blocked until

i

nput is done

Map.1

Reduce.1

Map.2

Reduce.2

Join

EXTRACT

AGGREGATE_PARTITION

FULL_AGGREGATE

Distr. File System

3Slide4

Log Analysis from Production

Logs from production cluster with thousands of machines, sampled over six months10,000+ jobs, 80PB of data, 4PB network transfersTask-level detailsProduction and experimental jobs

4Slide5

Outliers hurt!

Tasks that run longer than the rest in the phase

Median phase has 10% outliers, running for >10x longerSlow down jobs by 35% at medianOperational InefficiencyUnpredictability in completion times affect SLAsHurts development productivityWastes

compute-cycles5Slide6

Why do outliers occur?

6Mantri

: A system that mitigates outliers based on root-cause analysisInput Unavailable

Read Input

Execute

Network Congestion

Local Contention

Workload ImbalanceSlide7

Mantri’s Outlier Mitigation

Avoid RecomputationNetwork-aware Task PlacementDuplicate OutliersCognizant of Workload Imbalance

7Slide8

Recomputes: Illustration

(a) Barrier phases

(b) Cascading Recomputes

Inflation

Ideal

Actual

Inflation

Ideal

Actual

Recompute task

Normal task

8Slide9

What causes recomputes? [1]

Faulty machines

Bad disks, non-persistent hardware quirks(4%)

9

Set of faulty machines varies

with time, not constantSlide10

What causes recomputes? [2]

Transient machine load

Recomputes correlate with machine loadRequests for data access dropped 10Slide11

Replicate

costly outputs

Task1

Task 2

Task 3

MR

3

MR

2

((

MR

3

*(1-MR

2

)) * T3

(MR3 * MR2) (T3+T2)+

Replicate (TRep)

T

Rep

< TRecomp

REPLICATE

T

Recomp

= MR: Recompute Probability of a machine

Recompute only Task3 or both Task3 as well as Task2

11Slide12

Transient Failure Causes

Recomputes manifest in clutchesMachine prone to cause recomputes till the problem is fixed

Load abates, critical process restart etc.Clue: At least r recomputes within t time window on a machine

12Slide13

Speculative Recomputes

Anticipatorily

recompute tasks whose outputs are unread

Speculative

Recompute

Speculative

Recompute

(Read Fail)

Unread Data

13

Task

Input DataSlide14

Mantri’s Outlier Mitigation

Avoid RecomputationPreferential Replication + Speculative

Recomp.Network-aware Task PlacementDuplicate OutliersCognizant of Workload Imbalance14Slide15

Reduce Tasks

Tasks access output of tasks from previous phasesReduce phase (

74% of total traffic)

Reduce

Map

Network

Local

Outlier!

15

Distr. File SystemSlide16

Variable Congestion

Reduce task

Map output

Rack

Smart placement smoothens hotspots

16Slide17

Traffic-based Allotment

For every rack:d : datau : available uplink bandwidth

v : available downlink bandwidthGoal: Minimize phase completion time

Solve for task allocation fractions,

ai

17Slide18

Local Control is a good approx.

Let rack i have a

i fraction of tasksTime uploading, Tu = di (1 - ai) / uiTime downloading,

Td = (D – di)

ai / vi

Timei = max {Tu

, T

d

}

18

Goal:

Minimize

phase completion time

For every rack:

d

: data, D: data over all racks

u : available uplink bandwidth

v

: available downlink bandwidth

Link utilizations average out in long term, are steady on the short termSlide19

Mantri’s Outlier Mitigation

Avoid RecomputationPreferential Replication + Speculative

Recomp.Network-aware Task PlacementTraffic on link proportional to bandwidthDuplicate OutliersCognizant of Workload Imbalance

19Slide20

Contentions cause outliers

Tasks contend for local resourcesProcessor, memory etc.Duplicate tasks elsewhere in the clusterCurrent schemes duplicate towards end of

the phase (e.g., LATE [OSDI 2008])Duplicate outlier or schedule pending task?20Slide21

Resource-Aware Restart

21

Running task

Potential restart

(

t

new

)

now

time

t

rem

Save time and

resources:

P(

c

t

new

< (

c

+ 1)

t

rem)

Continuously observe and kill wasteful copiesSlide22

Mantri’s Outlier Mitigation

Avoid RecomputationPreferential Replication + Speculative Recomp

.Network-aware Task PlacementTraffic on link proportional to bandwidthDuplicate OutliersResource-Aware Restart

Cognizant of Workload Imbalance

22Slide23

Workload Imbalance

A quarter of the outlier tasks have more data to processUnequal key partitions for reduce tasksIgnoring these better than duplication

Schedule tasks in descending order of data to processTime α (Data to Process)[Graham ‘69] At worse, 33% of optimal23Slide24

Mantri’s Outlier Mitigation

Avoid RecomputationPreferential Replication + Speculative Recomp.Network-aware Task Placement

Traffic on link proportional to bandwidthDuplicate OutliersResource-Aware RestartCognizant of Workload ImbalanceSchedule in descending order of size24

Proactive

Reactive

Predict to act early

Be

resource-aware

Act based on the

causeSlide25

Results

Deployed in production Bing clustersTrace-driven simulationsMimic workflow, failures, data skewCompare with existing and idealized schemes

25Slide26

Jobs in the Wild

Act Early: Duplicates issued when task 42% done (77% for Dryad)Light: Issues fewer copies (.47X as many as Dryad)

Accurate: 2.8x higher success rate of copies26

Jobs faster by

32%

at median, consuming lesser resourcesSlide27

Recomputation Avoidance

27

Eliminates most recomputes with minimal extra resources

(Replication + Speculation) work well in tandemSlide28

Network-Aware Placement

28

Mantri well-approximates the idealBandwidth approximationsSlide29

Summary

From measurements in a production cluster, Outliers are a significant problemAre due to an interplay between storage, network and map-reduce

Mantri, a cause-, resource-aware mitigationDeployment shows encouraging results“Reining in the Outliers in MapReduce Clusters using Mantri”, USENIX OSDI 2010

29