/
Virtual Clusters Virtual Clusters

Virtual Clusters - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
379 views
Uploaded On 2017-10-10

Virtual Clusters - PPT Presentation

Supporting MapReduce in the Cloud Jonathan Klinginsmith jklinginindianaedu School of Informatics and Computing Indiana University Bloomington Lets Break this Title Down Virtual Clusters ID: 594730

virtual world cloud mapreduce world virtual mapreduce cloud goodbye clusters data world

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Virtual Clusters" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Virtual Clusters Supporting MapReduce in the Cloud

Jonathan Klinginsmith

jklingin@indiana.edu

School

of Informatics and Computing

Indiana University

BloomingtonSlide2

Let’s Break this Title DownVirtual Clusters

Supporting

MapReduce in the Cloud

2Slide3

Let’s Start with MapReduceAn example to get us warmed up…

Map

line = “hello world goodbye world”

words = line.split()# [“hello”, “world”, “goodbye”, “world”]

map_results

= map(lambda x: (x, 1), words)# [('hello', 1), ('world', 1), ('goodbye', 1), ('world', 1)]

3Slide4

Can’t have “MapReduce” without the “Reduce”

Reduce

from

operator import itemgetterfrom itertools import groupby

map_results.sort

()

# [('goodbye', 1), ('hello', 1), ('world', 1), ('world', 1)]for word

,

group

in groupby(map_results, itemgetter(0)): counts = [count for (word, count) in group] total = reduce(lambda x, y: x + y, counts) print("{0} {1}".format(word, total))goodbye 1hello 1world 2

4Slide5

What Did We Just Do?

hello world goodbye world

”Split:“hello”, “world”, “goodbye”, “world

Map:

('hello', 1), ('world', 1), ('goodbye', 1), ('world', 1)Sort:('goodbye', 1

), (

'

hello', 1), ('world', 1), ('world', 1)Reduce:('goodbye', 1), ('hello', 1), ('world', 2)5Slide6

The “Value” of Knowingthe “Key” Pieces*

Map

– creates

(key, value) pairs('hello

', 1), ('

world

', 1), ('goodbye', 1), ('world', 1)Sort by the key:('goodbye', 1), ('hello', 1), ('

world

', 1

), ('world', 1)Reduce operation peformed on the value:('goodbye', 1), ('hello', 1), ('world', 2)6* = Pun intendedSlide7

In General then…7

Split:

Map:

Sort:

Reduce:Slide8

Check “MapReduce” off the List

Virtual Clusters

Supporting MapReduce in

the Cloud

8Slide9

What is a Cluster?

9

orSlide10

Compute Cluster10

Set of computers

Proximity

NetworkingStorageResource ManagerSlide11

Compute Cluster

11Slide12

Breaking Down Large Problems

12

Many compute patterns have emerged one such is…

Scatter/Gather:Slide13

On the Cluster13Slide14

What if there are a Lot of Data?14

Network Bottleneck?Slide15

What about Local Node Storage?15

Distribute the data across the nodes (scatter/split)

Replicate the data to prevent data loss

Have the file system keep track of where the chunks (blocks) are stored

Scheduling resource will schedule jobs to the nodes storing the dataSlide16

MapReduce on the Cluster16

Data distributed across the nodes (scatter/split) when loaded into the file systemSlide17

Check “Clusters” off the ListVirtual

Clusters

Supporting MapReduce in

the Cloud

17Slide18

Virtual…and…the CloudLet’s

start

with

Virtual...A Virtual Machine (VM)A “guest” virtual computer

running

on a “host” physical computerA machine image (MI) is instantiated into a running VMMI = snapshot of operating system

(OS)

and

any software18Slide19

Virtual…and…the Cloud

The

Cloud...Virtualization + Internet

Introduction

of the CloudScalabilityElasticityUtility computing – not a capital expenditureThree levels of serviceSoftware (

SaaS

) –

e.g., Salesforce.com, Web-based emailPlatform (PaaS) – e.g., Google App EngineInfrastructure (IaaS) – e.g., Amazon EC219Slide20

Why is the Cloud Interesting?

In

IndustryScalability

get

scale not present in internal data centersElasticity – change scale as capacity demandsUtility computing –

no

capital investimentExamples use-cases: High Performance/Throughput ComputingOn-line game developmentScalable web development20Slide21

Why is the Cloud Interesting?

In

AcademiaReproduciblity

resuse MIs between researchers Educational OpportunitiesVirtual environment  Variety of uses and configurations

Learn

about foundational system componentsCollaborate within the same environment21Slide22

Covered “Virtal” and “the Cloud”

Virtual

Clusters Supporting MapReduce

in

the CloudLet’s put it all together...22Slide23

MapReduce Virtual Clusters in the Cloud

23

Create

virtual

clusters running MapReduceTest algorithmsTest infrastructure and other system attributesSlide24

MapReduce Virtual Clusters in the Cloud

24

Research

Areas

Bioinformatics

– e.g., Genomic AlignmentsData/Text Mining and ProcessingLarge-scale Graph

AlgorithmsSlide25

MapReduce Virtual Clusters in the Cloud

25

Research

Areas

Bioinformatics

– e.g., Genomic AlignmentsData/Text Mining and ProcessingLarge-scale Graph

AlgorithmsSlide26

From Virtual Clustersto a Local Sandbox26

Use

a

l

ocal

sandbox to cover MapReduce topics