Supporting MapReduce in the Cloud Jonathan Klinginsmith jklinginindianaedu School of Informatics and Computing Indiana University Bloomington Lets Break this Title Down Virtual Clusters ID: 594730
Download Presentation The PPT/PDF document "Virtual Clusters" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Virtual Clusters Supporting MapReduce in the Cloud
Jonathan Klinginsmith
jklingin@indiana.edu
School
of Informatics and Computing
Indiana University
BloomingtonSlide2
Let’s Break this Title DownVirtual Clusters
Supporting
MapReduce in the Cloud
2Slide3
Let’s Start with MapReduceAn example to get us warmed up…
Map
line = “hello world goodbye world”
words = line.split()# [“hello”, “world”, “goodbye”, “world”]
map_results
= map(lambda x: (x, 1), words)# [('hello', 1), ('world', 1), ('goodbye', 1), ('world', 1)]
3Slide4
Can’t have “MapReduce” without the “Reduce”
Reduce
from
operator import itemgetterfrom itertools import groupby
map_results.sort
()
# [('goodbye', 1), ('hello', 1), ('world', 1), ('world', 1)]for word
,
group
in groupby(map_results, itemgetter(0)): counts = [count for (word, count) in group] total = reduce(lambda x, y: x + y, counts) print("{0} {1}".format(word, total))goodbye 1hello 1world 2
4Slide5
What Did We Just Do?
“
hello world goodbye world
”Split:“hello”, “world”, “goodbye”, “world
”
Map:
('hello', 1), ('world', 1), ('goodbye', 1), ('world', 1)Sort:('goodbye', 1
), (
'
hello', 1), ('world', 1), ('world', 1)Reduce:('goodbye', 1), ('hello', 1), ('world', 2)5Slide6
The “Value” of Knowingthe “Key” Pieces*
Map
– creates
(key, value) pairs('hello
', 1), ('
world
', 1), ('goodbye', 1), ('world', 1)Sort by the key:('goodbye', 1), ('hello', 1), ('
world
', 1
), ('world', 1)Reduce operation peformed on the value:('goodbye', 1), ('hello', 1), ('world', 2)6* = Pun intendedSlide7
In General then…7
Split:
Map:
Sort:
Reduce:Slide8
Check “MapReduce” off the List
Virtual Clusters
Supporting MapReduce in
the Cloud
8Slide9
What is a Cluster?
9
orSlide10
Compute Cluster10
Set of computers
Proximity
NetworkingStorageResource ManagerSlide11
Compute Cluster
11Slide12
Breaking Down Large Problems
12
Many compute patterns have emerged one such is…
Scatter/Gather:Slide13
On the Cluster13Slide14
What if there are a Lot of Data?14
Network Bottleneck?Slide15
What about Local Node Storage?15
Distribute the data across the nodes (scatter/split)
Replicate the data to prevent data loss
Have the file system keep track of where the chunks (blocks) are stored
Scheduling resource will schedule jobs to the nodes storing the dataSlide16
MapReduce on the Cluster16
Data distributed across the nodes (scatter/split) when loaded into the file systemSlide17
Check “Clusters” off the ListVirtual
Clusters
Supporting MapReduce in
the Cloud
17Slide18
Virtual…and…the CloudLet’s
start
with
Virtual...A Virtual Machine (VM)A “guest” virtual computer
running
on a “host” physical computerA machine image (MI) is instantiated into a running VMMI = snapshot of operating system
(OS)
and
any software18Slide19
Virtual…and…the Cloud
The
Cloud...Virtualization + Internet
Introduction
of the CloudScalabilityElasticityUtility computing – not a capital expenditureThree levels of serviceSoftware (
SaaS
) –
e.g., Salesforce.com, Web-based emailPlatform (PaaS) – e.g., Google App EngineInfrastructure (IaaS) – e.g., Amazon EC219Slide20
Why is the Cloud Interesting?
In
IndustryScalability
–
get
scale not present in internal data centersElasticity – change scale as capacity demandsUtility computing –
no
capital investimentExamples use-cases: High Performance/Throughput ComputingOn-line game developmentScalable web development20Slide21
Why is the Cloud Interesting?
In
AcademiaReproduciblity
–
resuse MIs between researchers Educational OpportunitiesVirtual environment Variety of uses and configurations
Learn
about foundational system componentsCollaborate within the same environment21Slide22
Covered “Virtal” and “the Cloud”
Virtual
Clusters Supporting MapReduce
in
the CloudLet’s put it all together...22Slide23
MapReduce Virtual Clusters in the Cloud
23
Create
virtual
clusters running MapReduceTest algorithmsTest infrastructure and other system attributesSlide24
MapReduce Virtual Clusters in the Cloud
24
Research
Areas
Bioinformatics
– e.g., Genomic AlignmentsData/Text Mining and ProcessingLarge-scale Graph
AlgorithmsSlide25
MapReduce Virtual Clusters in the Cloud
25
Research
Areas
Bioinformatics
– e.g., Genomic AlignmentsData/Text Mining and ProcessingLarge-scale Graph
AlgorithmsSlide26
From Virtual Clustersto a Local Sandbox26
Use
a
l
ocal
sandbox to cover MapReduce topics