Whatif Analysis and Costbased Optimization of MapReduce Programs Herodotos Herodotou Shivnath Babu Duke University Analysis in the Big Data Era 8312011 Duke University 2 Popular option ID: 410917
Download Presentation The PPT/PDF document "Profiling," is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs
Herodotos HerodotouShivnath Babu
Duke UniversitySlide2
Analysis in the Big Data Era8/31/2011Duke University2
Popular optionHadoop software stack
MapReduce
Execution Engine
Distributed File System
Hadoop
Java / C++ /
R
/ Python
Oozie
Hive
Pig
Elastic
MapReduce
Jaql
HBaseSlide3
Analysis in the Big Data Era8/31/2011Duke University3
Popular optionHadoop software stackWho are the users?Data analysts, statisticians, computational scientists…Researchers, developers, testers…
You!Who performs setup and tuning?The users!Usually lack expertise to tune the systemSlide4
Problem OverviewGoalEnable Hadoop users and applications to get good performance automaticallyPart of the Starfish system This talk: tuning individual MapReduce
jobsChallengesHeavy use of programming languages for
MapReduce programs and UDFs (e.g., Java/Python)Data loaded/accessed as opaque filesLarge space of tuning choices8/31/2011Duke University
4Slide5
MapReduce Job Execution8/31/2011Duke University5
split 0
map
out 0
reduce
Two Map Waves
One Reduce Wave
split 2
map
split 1
map
split 3
map
o
ut
1
reduce
job
j
=
<
program
p
, data
d
, resources
r
, configuration
c
>Slide6
Optimizing MapReduce Job ExecutionSpace of configuration choices:Number of map tasksNumber of reduce tasksPartitioning of map outputs to reduce tasksMemory allocation to task-level buffersMultiphase external sorting in the tasksWhether output data from tasks should be compressed
Whether combine function should be used8/31/2011
Duke University6job j =
<
program
p
, data
d
, resources
r
, configuration
c
>Slide7
Optimizing MapReduce Job ExecutionUse defaults or set manually (rules-of-thumb)Rules-of-thumb may not suffice
8/31/2011Duke University
72-dim projection of 13-dim surface
Rules-of-thumb settingsSlide8
Applying Cost-based OptimizationGoal:Just-in-Time OptimizerSearches through the space S of parameter settingsWhat-if EngineEstimates
perf using properties of p, d, r, and c
Challenge: How to capture the properties of an arbitrary MapReduce program p?8/31/2011Duke University8Slide9
Job ProfileConcise representation of program execution as a jobRecords information at the level of “task phases”Generated by Profiler through measurement or by the
What-if Engine through estimation8/31/2011
Duke University9
Memory Buffer
Merge
Sort,
[Combine],
[Compress]
Serialize,
Partition
map
Merge
split
DFS
Spill
Collect
Map
ReadSlide10
Job Profile FieldsDataflow: amount of data flowing through task phases
Map output bytes
Number of map-side spillsNumber of records in buffer per spill8/31/2011
Duke University
10
Costs:
execution times at
the level of task phases
Read phase time in the map task
Map phase time in the map task
Spill phase time in the map task
Dataflow Statistics:
statistical information about the dataflow
Map
func’s
selectivity (output / input)
Map output compression ratio
Size of records (keys and values)
Cost Statistics:
statistical information about the
costs
I/O cost for reading from local disk per byte
CPU cost for executing Map
func
per record
CPU cost for uncompressing the input per byteSlide11
Generating Profiles by MeasurementGoalsHave zero overhead when profiling is turned offRequire no modifications to HadoopSupport unmodified MapReduce programs written in Java or Hadoop Streaming/Pipes (Python/Ruby/C++)
Dynamic instrumentationMonitors task phases of MapReduce job executionEvent-condition-action rules are specified, leading to run-time instrumentation of Hadoop internalsWe currently use
BTrace (Hadoop internals are in Java)8/31/2011Duke University11Slide12
Generating Profiles by Measurement8/31/2011Duke University12
split 0
map
out 0
reduce
split 1
map
enable
profiling
raw data
enable
profiling
raw data
enable
profiling
raw data
map profile
reduce profile
job profile
Use of Sampling
Profiling
Task executionSlide13
What-if Engine8/31/2011Duke University13
Task Scheduler Simulator
What-if Engine
Job Oracle
Job
Profile
<p, d
1
, r
1
, c
1
>
Input Data
Properties
<d
2
>
Cluster
Resources
<r
2
>
Configuration
Settings
<c
2
>
Virtual Job Profile for
<p, d
2
, r
2
, c
2
>
Properties of Hypothetical jobSlide14
Virtual Profile Estimation8/31/2011Duke University14
Given profile for job j = <p, d
1, r1, c1> estimate
profile for job
j' = <p, d
2
, r
2
, c
2
>
(Virtual) Profile for
j'
Dataflow
Statistics
Dataflow
Cost
Statistics
Costs
Profile for
j
Input
Data
d
2
Confi-guration
c
2
Resources
r
2
Costs
White-box Models
Cost
Statistics
Relative
Black-box
Models
Dataflow
White-box Models
Dataflow
Statistics
Cardinality
ModelsSlide15
White-box ModelsDetailed set of equations for HadoopExample:8/31/2011Duke University
15
Calculate dataflow in each task phase in a map task
Input data properties
Dataflow statistics
Configuration parameters
Memory Buffer
Merge
Sort,
[Combine],
[Compress]
Serialize,
Partition
map
Merge
split
DFS
Spill
Collect
Map
ReadSlide16
Just-in-Time Optimizer8/31/2011Duke University16
Best Configuration
Settings
<
c
opt
>
for
<p, d
2
, r
2
>
(Sub) Space Enumeration
Recursive Random Search
Just-in-Time Optimizer
Job
Profile
<p, d
1
, r
1
, c
1
>
Input Data
Properties
<d
2
>
Cluster
Resources
<r
2>
What-if CallsSlide17
Recursive Random Search8/31/2011Duke University17
Parameter Space
Space Point
(configuration
settings)
Use What-if Engine to costSlide18
Experimental Methodology15-30 Amazon EC2 nodes, various instance typesCluster-level configurations based on rules of thumbData sizes: 10-180 GBRule-based Optimizer Vs. Cost-based Optimizer8/31/2011Duke University
18
Abbr.MapReduce ProgramDomainDatasetCO
Word
Co-occurrence
NLP
Wikipedia
WC
WordCount
Text Analytics
Wikipedia
TS
TeraSort
Business Analytics
TeraGen
LG
LinkGraph
Graph
Processing
Wikipedia (compressed)
JO
Join
Business Analytics
TPC-H
TF
TF-IDF
Information Retrieval
WikipediaSlide19
Job Optimizer Evaluation8/31/2011Duke University19
Hadoop cluster: 30 nodes, m1.xlargeData sizes: 60-180 GBSlide20
Job Optimizer Evaluation8/31/2011Duke University20
Hadoop cluster: 30 nodes, m1.xlargeData sizes: 60-180 GBSlide21
Estimates from the What-if Engine8/31/2011Duke University21
Hadoop
cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB Wikipedia
True surface
Estimated surfaceSlide22
Estimates from the What-if Engine8/31/2011Duke University22
Profiling on Test cluster, prediction on Production cluster
Test cluster: 10 nodes, m1.large, 60 GBProduction cluster: 30 nodes, m1.xlarge, 180 GBSlide23
Profiling Overhead Vs. Benefit8/31/2011Duke University23
Hadoop
cluster: 16 nodes, c1.mediumMapReduce Program: Word Co-occurrenceData set: 10 GB WikipediaSlide24
ConclusionWhat have we achieved?Perform in-depth job analysis with profilesPredict the behavior of hypothetical job executionsOptimize arbitrary MapReduce programsWhat’s next?Optimize job workflows/workloadsAddress the cluster sizing (provisioning) problem
Perform data layout tuning8/31/2011Duke University
24Slide25
Starfish: Self-tuning Analytics System8/31/2011Duke University25
www.cs.duke.edu/starfish
Software Release: Starfish v0.2.0
Demo Session C:
Thursday, 10:30-12:00
Grand CrescentSlide26
Hadoop Configuration ParametersParameterDefault Value
io.sort.mb
100io.sort.record.percent0.05io.sort.spill.percent
0.8
io.sort.factor
10
mapreduce.combine.class
null
min.num.spills.for.combine
3
mapred.compress.map.output
false
mapred.reduce.tasks
1
mapred.job.shuffle.input.buffer.percent
0.7
mapred.job.shuffle.merge.percent
0.66
mapred.inmem.merge.threshold
1000
mapred.job.reduce.input.buffer.percent
0
mapred.output.compress
false
8/31/2011
Duke University
26Slide27
Amazon EC2 Node TypesNodeType
CPU(EC2Units)
Mem(GB)Storage(GB)
Cost
($/hour)
Map
Slots
per Node
Reduce
Slots
per Node
Max
Mem
per Slot
m1.small
1
1.7
160
0.085
2
1
300
m1.large
4
7.5
850
0.343
2
1024m1.xlarge
8
151690
0.68
44
1536c1.medium
51.7
3500.172
2300
c1.xlarge20
7
16900.68
86
400
8/31/2011
Duke University
27Slide28
Input Data & Cluster PropertiesInput Data PropertiesData sizeBlock sizeCompressionCluster PropertiesNumber of nodesNumber of map slots per nodeNumber of reduce slots per nodeMaximum memory per task slot
8/31/2011Duke University
28