Basics Divide and conquer Partition large problem into smaller subproblems Worker work on subproblems in parallel Threads in a core cores in multicore processor multiple processor in a machine machines in a cluster ID: 649260
Download Presentation The PPT/PDF document "Map Reduce Basics Chapter 2" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Map Reduce Basics Chapter 2Slide2
Basics
Divide and conquer
Partition large problem into smaller
subproblems
Worker work on
subproblems
in parallel
Threads in a core, cores in multi-core processor, multiple processor in a machine, machines in a cluster
Combine intermediate results from worker to final result
Issues
How break up into smaller tasks
Assign tasks to workers
Workers get data needed
Coordinate synchronization among workers
Share partial results
Do all if SE errors and HW faults?Slide3
BasicsMR – abstraction that hides system-level details from programmerMove code to data
Spread data across disks
DFS manages storageSlide4
TopicsFunctional programmingMapReduce
Distributed file systemSlide5
Functional Programming RootsMapReduce
= functional programming
plus distributed
processing on steroids
Not a new idea… dates back to the 50’s (or even 30’s)
What is functional programming?
Computation as application of functions
Computation is evaluation of mathematical functions
Avoids state and mutable data
Emphasizes application of functions instead of changes in stateSlide6
Functional Programming RootsHow is it different?
Traditional notions of “data” and “instructions” are not applicable
Data flows are implicit in program
Different orders of execution are possible
Theoretical foundation provided by lambda calculus
a formal system for function definition
Exemplified
by
LISP, SchemeSlide7
Overview of LispFunctions written in prefix notation
(
+
1 2)
3
(*
3 4)
12
(
sqrt
( + (*
3 3)
(* 4
4)))
5
(define x 3)
x
(* x 5)
15Slide8
FunctionsFunctions = lambda expressions bound to variables
Example expressed with lambda:(+ 1 2)
3
λ
x
λ
y.x+y
Above expression is equivalent to:
Once defined, function can be applied:
(define (
foo
x y)
(
sqrt
(+ (* x x) (* y y))))
(define
foo
(lambda (x y) (sqrt (+ (* x x) (* y y)))))
(foo 3 4)
5Slide9
Functional Programming RootsTwo important concepts in functional programming
Map: do something to everything in a list
Fold: combine results of a list in some waySlide10
Functional Programming MapHigher order functions – accept other functions as arguments
Map
Takes a function f and its argument, which is a list
applies to all elements in list
Returns a list as result
Lists are primitive data types
[
1 2 3 4 5]
[[a 1] [b 2] [c 3]]Slide11
Map/Fold in ActionSimple map example:
(map (lambda (x) (* x x
))
[1 2 3 4 5])
[1 4 9 16 25
]Slide12
Functional Programming ReduceFold
Takes
function g, which has 2 arguments: an
initial
value and a list.
The g applied to initial value and 1
st
item in list
Result stored in intermediate variable
Intermediate variable and next item in list 2
nd
application of g, etc.
Fold returns final value of intermediate variableSlide13
Map/Fold in ActionSimple map example:
Fold examples:
Sum of squares:
(map (lambda (x) (* x x
))
[1 2 3 4 5])
[1 4 9 16 25
]
(fold + 0
[1
2 3 4
5])
15
(fold * 1
[1
2 3 4
5]) 120
(define (sum-of-squares v
) // where v is a list
(fold + 0 (map (lambda (x) (* x x)) v)))
(sum-of-squares
[1
2 3 4
5])
55Slide14Slide15
Functional Programming RootsUse map/fold in combinationMap – transformation of dataset
Fold- aggregation operation
Can apply map in parallel
Fold – more restrictions, elements must be brought together
Many applications do not require g be applied to all elements of list, fold aggregations in parallelSlide16
Functional Programming RootsMap in MapReduce
is same as in functional programming
Reduce corresponds to fold
2 stages:
User specified computation applied over all input, can occur in parallel, return intermediate output
Output aggregated by another user-specified computationSlide17
Mappers/ReducersKey-value pair (
k,v
) – basic data structure in MR
Keys, values –
int
, strings, etc., user defined
e.g. keys – URLs, values – HTML content
e.g. keys – node ids, values – adjacency lists of nodes
Map: (k1, v1) -> [(k2, v2)]
Reduce: (k2, [v2]) -> [(k3, v2)]
Where […] denotes a listSlide18
General Flow
Apply
mapper
to every input key-value pair stored in DFS
Generate arbitrary number of intermediate (
k,v
)
Distributed group by operation (
shuffle
) on intermediate keys
Sort
intermediate results by key (not across reducers)
Aggregate intermediate results
Generate final output to DFS – one file per reducer
Map
ReduceSlide19
What function is implemented?Slide20Slide21
Example: unigram (word count)(docid
, doc) on DFS, doc is text
Mapper
tokenizes (
docid
, doc), emits (
k,v
) for every word – (word, 1)
Execution framework all same keys brought together in reducer
Reducer – sums all counts (of 1) for word
Each reduce writes to one file
Words within file sorted, file same # words
Can use output as input to another MRSlide22
Combine - Bandwidth Optimization
Issue:
large number of key-value
pairs
Example – word count (word, 1)
If
copy across network intermediate data >
input
Solution:
use
Combiner
functions
allow local aggregation (after mapper) before shuffle
sort
Word Count - Aggregate
(count each word
locally)intermediate = # unique wordsExecuted on same machine as mapper – no output from other mappers
Results in a “mini-reduce” right after the map phase(k,v) of same type as input/outputIf operation associative and commutative, reduce can be combinerReduces key-value pairs to save bandwidthSlide23
Partitioners – Load Balance
Issue:
Intermediate results all on one reducer
Solution:
use
Partitioner
functions
divide up intermediate key space and assign (
k,v
) to reducers
Specifies task to which copy (
k,v
)
Reducer processes keys in sorted order
Partitioner
computes hash value of
key, takes mod of value with # reducersHopefully same number of each to each reducerBut may be- ZipfianSlide24
MapReduce
Programmers specify two functions:
map
(k, v)
→ <k’, v’>*
reduce
(k’, v’) → <k’, v’>*
All v’ with the same k’ are reduced together
Usually, programmers also specify:
partition
(k’, number of partitions ) → partition for k’
Often a simple hash of the key, e.g. hash(k’) mod n
Allows reduce operations for different keys in parallel
Implementations:
Google has a proprietary implementation in C++
Hadoop
is an open source implementation in Java (lead by Yahoo)Slide25
Its not just Map and Reduce
Apply mapper to every input key-value pair stored in DFS
Generate arbitrary number of intermediate (
k,v
)
Aggregate locally
Assign to reducers
Distributed group by operation (
shuffle
) on intermediate keys
Sort
intermediate results by key (not across reducers)
Aggregate intermediate results
Generate final output to DFS – one file per reducer
Map
Reduce
Combine
PartitionSlide26
Execution FrameworkMapReduce
program (job) contains
Code for
mappers
Combiners
Partitioners
Code for reducers
Configuration parameters (where is input, store output)
Execution
framework takes care of
everything else
Developer submits job to submission node of cluster (
jobtracker
)Slide27
Recall these problems?How do we assign work units to workers?
What if we have more work units than workers?
What if workers need to share partial results?
How do we aggregate partial results?
How do we know all the workers have finished?
What if workers die?Slide28
Execution FrameworkSchedulingJob divided into tasks (certain block of (
k,v
) pairs)
Can have 1000s jobs need to be assigned
May exceed number that can run concurrently
Task queue
Coordination among tasks from different jobsSlide29
Execution FrameworkSpeculative
execution
Map
phase only as fast as?
slowest map task
Problem: Stragglers
, flaky hardware
Solution: Use
speculative execution
:
Exact copy of same task
on different machine
Uses result of fastest task in attempt to finish
Better for map or reduce?
Can
improve running time by 44% (Google)
Doesn’t help if skewed in distributed of valuesSlide30
Execution FrameworkData/code co-locationExecute near dataIt not
possible must stream data
Try to keep within same rackSlide31
Execution Framework
Synchronization
Concurrently running processes join up
Intermediate (
k,v
) grouped by key,
copy intermediate data over network, shuffle/sort
Number of copy operations? Worst case:
M
X R copy operations
Each
mapper
may send intermediate results to every reducer
Reduce computation cannot start until all
mappers
finished, (
k,v) shuffled/sortedDiffers from functional programmingCan copy intermediate (k,v) over network to reducer when mapper finishesSlide32
Execution FrameworkError/fault handlingThe norm
Disk failures, RAM errors, datacenter outages
Software errors
Corrupted dataSlide33Slide34
Differences in MapReduce Implementations
Hadoop
(Apache)
vs. Google
Hadoop
- Values arbitrarily ordered, can change key in reducer
Google – program can specify 2ndary sort, can’t change key in reducer
Hadoop
Programmer can specify number of map tasks, but framework makes final decision
In reduce, programmer
specified number of tasks is usedSlide35
HadoopCareful using external resources (e.g. bottleneck querying SQL DB
)
Mappers can emit arbitrary number of intermediate (
k,v
),
can be of different type
Reduce can emit
artibtraty
number of final (
k,v
) and can be of different type than intermediate (
k,v
)
Different from functional programming, can have side effects (state change internal – may cause problems, external may write to files)
MapReduce
can have no reduce, but must have mapperCan just pass identity function to reducerMay
not have any input (compute pi)Slide36
Other SourcesOther source can serve as source/destination for data from MapReduce
Google –
BigTable
Hbase
–
BigTable
clone
Hadoop
– integrated RDB with parallel processing, can write to DB tablesSlide37
Distributed File System (DFS)In HPC, storage distinct from computation
NAS (network attached storage)
and SAN
are common
Separate, dedicated nodes for storage
Fetch, load, process, write
Bottleneck
Higher performance networks $$ (10G Ethernet), special purpose interconnects $$$ (
InfiniBand
)
$$ increases non-linearly
In GFS Computation
and storage not distinct
componentsSlide38
Hadoop Distributed File System - HDFS
GFS supports proprietary
MapReduce
HDFS – supports
Hadoop
Don’t have to run GFS on DFS, but misses advantages
Difference in
GFS and HDFS vs. DFS:
Adapted to large data processing
divide user data into
chunks/blocks -
LARGE
Replicate
these across
the local disk nodes in cluster
Master-slave architectureSlide39
HDFS vs GFS (Google File System)
Difference in HDFS:
Master-slave architecture
GFS: Master (master), slave (
chunkserver
)
HDFS: master (
namenode
), slave (
datanode
)
Master – namespace (metadata, directory structure, file to
block
mapping, location of
blocks
, access permission)
Slaves – manage actual data blocksClient contacts namespace, gets data from slaves, 3 copies of each block, etc.Block is 64 MBInitially Files were immutable
– once closed cannot be modifiedSlide40Slide41Slide42
HDFS Namenode
Namespace management
Coordinate file operations
Lazy garbage collection
Maintain file system health
Heartbeats, under-replication, balancing
Supports subset of POSIX API, pushed to application
No SecuritySlide43
Hadoop Cluster ArchitectureHDFS
namenode
runs daemon
Job submission node runs
jobtracker
point of contact run
MapReduce
Monitors progress of
MapReduce
jobs, coordinates
Mappers
and reducers
Slaves run
tasktracker
Runs users code,
datanode daemon, serve HDFS dataSend heartbeat messages to jobtrackerSlide44
Hadoop Cluster ArchitectureNumber of reduce tasks depends on reducers specified by programmer
Number of map tasks depends on
Hint from programmer
Number of input files
Number of HDFS data blocks of filesSlide45Slide46
Hadoop Cluster ArchitectureMap tasks assigned
(
k,v
) called input split
Input splits computed automatically
Aligned on HDFS boundaries so associated with single block, simplifies scheduling
Data locality, if not stream across network (same rack if possible)Slide47
How can we use MapReduce to solve problems?Slide48Slide49
Hadoop Cluster Architecture
Mappers
in
Hadoop
Javaobjects
with a MAP method
Mapper
object instantiated for every map task by
tasktracker
Life cycle – instantiation, hook in API for program specified code
Mappers
can load state, static data sources, dictionaries, etc.
After initialization: MAP method called by framework on all (
k,v
) in input split
Method calls within same Java object, can preserve state across multiple (
k,v) in same taskCan run programmer specified termination codeSlide50
Hadoop Cluster ArchitectureReducers in
Hadoop
Execution similar to that of
mappers
Instantiation, initialization, framework calls REDUCE method with intermediate key and
iterator
over all key values
Intermediate keys in sorted order
Can preserve state across multiple intermediate keysSlide51
CAP TheoremConsistency, availability, partition tolerance
Cannot satisfy all 3
Partitioning unavoidable in large data systems, must trade off availability and consistency
If master fails, system is unavailable so consistent!
If multiple masters, more available, but inconsistent
Workaround to single
namenode
Warm standby
namenode
Hadoop
community working on itSlide52