/
Reproducible Environment for Scientific Reproducible Environment for Scientific

Reproducible Environment for Scientific - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
380 views
Uploaded On 2015-10-03

Reproducible Environment for Scientific - PPT Presentation

Applications Lab session Tak Lon Stephen Wu Overview Introduction VirtualBox Prepackaged Image Example 1 Sandbox Hadoop WordCount Example 2 Cloud Twister WordCount Exercises ID: 148527

salsadpi twister file root twister salsadpi root file cloud jar hadoop sandbox chef wordcount conf ssh json kmeans mode

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Reproducible Environment for Scientific" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Reproducible Environment for Scientific Applications (Lab session)

Tak

-Lon (Stephen

) WuSlide2

OverviewIntroductionVirtualBox Prepackaged Image

Example 1: Sandbox

Hadoop

WordCount

Example 2: Cloud

Twister

WordCount

Exercises

Sandbox

Hadoop

/Twister

Kmeans

Cloud

Hadoop

/Twister

KmeansSlide3

Motivations

Background knowledge

Environment setting

Different cloud infrastructure

tools

Software dependencies

Long learning path

Automatic these complicated steps?

Solution: Salsa Dynamic Provisioning Infrastructure (

SalsaDPI

).

batch-like programSlide4

What is SalsaDPI? (Sandbox)

OS

chef-solo

SalsaDPI

Jar

S/W

Applications

1. Read a Conf. file and execute software run-list

2

. Install software

3

. Run apps

User Configuration Slide5

OS

Chef

Apps

S/W

VM

OS

Chef

Apps

S/W

VM

OS

Chef

Apps

S/W

VM

OS

Chef Client

SalsaDPI

Jar

Chef Server

Bootstrap VMs

with a conf. file

4

. VM(s) Information

2. Retrieve conf. Info. and request Authentication

and

Authorization

3

. Authenticated and Authorized to execute software run-list

5

. Submit application

commands

6. Obtain Result

What is

SalsaDPI

? (Cloud)

* Chef architecture

http://wiki.opscode.com/display/chef/Architecture+Introduction

User

Conf.Slide6

What is SalsaDPI? (Cont.)

Chef features

O

n-demand install software when starting VMs

Monitor software installation progress

Easy to use

SalsaDPI featuresProvide configurable interfaceAutomate Hadoop/Twister/other binary execution

*Chef Official website: http://www.opscode.com/chef/Slide7

Hands-on SessionSlide8

Online Tutorial pagehttp://salsahpc.indiana.edu/ScienceCloud/reproduce-intro.htmlSlide9

PrerequisitesInstall

VirtualBox

on your laptop, download and import a prepackaged image

Setup

FutureGrid

Eucalyptus environmentMake sure you setup the shared folder between host and guest machine correctly# login to

FutureGrid India Headnode i136 $ ssh -

i ~/fg_private_key.pem

johnny@india.futuregrid.org

~/fg_private_key.pem have to be replaced to your own private key file nameSlide10

About Pre-packaged Image

It has the following software installed and configured under /root/software/:

Java JDK

Chef

Hadoop

Twister and

ActiveMQHbasePigsalsaDPI (/root/salsaDPI/)Slide11

Important NotesIf you have

activemq.log

and

kahadb

in

directory

/root/software/apache-activemq-5.4.2/, please remove them. Otherwise, it will cause errors when running sandbox Twister applications.$ cd

/root/software/apache-activemq-5.4.2/$ ls

activemq.log kahadb

$ rm

-rf activemq.log

kahadbSlide12

ExamplesExample 1: Sandbox Hadoop

WordCount

Example 2: Cloud Twister

WordCount

Goals

Learn and modify SalsaDPI json configuration file Execute SalsaDPI java executable with passing the configuration file

http://salsahpc.indiana.edu/ScienceCloud/handson1_chef_sandbox.html* Json

metadata format example : http://json.org/example.htmlSlide13

Step 1. Open the Conf. File

Locate and open the configuration file.

/root/

salsaDPI

/sandbox/templates/

sandbox_hadoopTemplate.json

/root/salsaDPI/sandbox/templates/

sandbox_twisterTemplate.json Slide14

Step 2. Modify Conf. File

'

applicationParameters

':

{

'

applicationType':'Hadoop',

'

localPathOfProgramBinary':'/root/

salsaDPI/apps/hadoopWordCount.jar', 

'

localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt

', 

'

localPathOfBinaryDependency

':'', 

'

programExecuteLocation

':'', 

'

programArgs

':'

bin/

hadoop

jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'

} Slide15

Detail description could be see here:http://salsahpc.indiana.edu/ScienceCloud/handson1_chef_sandbox.html

applicationParameters

A

json

object that contains user-defined application's information

applicationType

Type of user-defined application, options: Hadoop or Twister

localPathOfProgramBinary

Full path of user-defined Hadoop or Twister compiled jar executable on the working machine

localPathOfProgramInput

Full path of user-defined input file on the working machine, normally, a plaintext or a *.tar.gz filelocalPathOfBinaryDependency

Full path of user-defined program dependency file on the working machine, such as Twister Kmeans initial cluster fileprogramExecuteLocation

Path to Twister program execution script refer to Twister package, such as samples/wordcount/bin or samples/kmeans/bin

twisterInputFilesPreFixTwister Input files prefix. Refer to the provided package, for Twister WordCount, the file prefixed is wc_data, for Twister Kmeans is km_data.

programArgsUser-defined program execution commandSlide16

Sandbox Hadoop WordCount

{ // Useful general variables of

programArgs

for

applicationParameters

object

// #_JAR_#, #_JOB_ID_#, 

// #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#,

// #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_#// '

mode':'sandbox', | 'mode':'cloud',

'mode':'sandbox',

// chef-solo related parameters'chef':{'

chefSoloRecipeUrls':'http://129.79.49.248/chef-solo.tar.gz', '

chefSoloConfFilePath':'/root/salsaDPI/

solo.rb'}, 

//

ssh

passwordless

related parameters

'

ssh

':{'

SSHLoginUsername

':'root', 

'

SSHPrivateKeyPath

':'/root/.ssh/id_rsa

' }, // runtime

softwares such as recipe[hadoopSandbox] or recipe[

twisterSandbox]'softwareRecipes

':['recipe[hadoopSandbox]

'], // please don't change this line

// user-defined application

parameters

'

applicationParameters

':{'

applicationType

':'

Hadoop

',

'

localPathOfProgramBinary

':'

/root/

salsaDPI

/apps/hadoopWordCount.jar

', 

'

localPathOfProgramInput

':'

/root/

salsaDPI

/input/hadoopWordCountInput.txt', 

'localPathOfBinaryDependency':'', 

'programExecuteLocation':'', 'programArgs':'

bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'} }Slide17

Step 3. Execute SalsaDPI with Conf.

Execute

SalsaDPI

with command:

$ cd ~/

salsaDPI

$ java

-cp salsaDPI.jar cgl.salsa.salsadpi.Driver

<path_to_conf_file>The output will be stored at

<workingDir>/salsaDPI_output

/<job_uuid>/output/*.Slide18

DemoDemo videoVideo hands-on 1

Sandbox

Hadoop

WordCount

YouTube link (1080p)Slide19

ExamplesExample 1: Sandbox Hadoop

WordCount

Example 2: Cloud Twister

WordCount

Goals

Make sure FutureGrid Eucalyptus setup and download required files correctlyLearn and modify SalsaDPI

json configuration fileExecute SalsaDPI java executable with passing the configuration filehttp://salsahpc.indiana.edu/ScienceCloud/handson2_chef_cloud.html

For live testing, please make sure your name is here* Json

metadata format example : http://json.org/example.htmlSlide20

Step 1. Open the Conf. File

Locate and open the configuration file.

/root/

salsaDPI

/cloud/templates/

cloud

_hadoopTemplate.json

/root/salsaDPI/cloud/templates/

cloud_twisterTemplate.json Slide21

Step 2. Modify Conf. File

'

eucaInfo

':{

'

eucarcFilePath

':'

#_FullPath_to_eucarc_File

_#',

'eucaImageEmi':'emi-A8F63C29',

'eucaSSHPublicKey

':'#_Euca_Keypair_PublicKeyName

_#',

'eucaVmType':'m1.small',

'amountOfInstances':2},

 Slide22

Step 2. Modify Conf. File (Cont.)

'

ssh

': {

'

SSHLoginUsername

':'

root', 

'SSHPrivateKeyPath':'/root/#_

yourPrivatekey_FileName_#'

}, Slide23

Step 2. Modify Conf. File (Cont.)

'

applicationParameters

':

{

'

applicationType':

'Twister',

'localPathOfProgramBinary':'/root/

salsaDPI/apps/Twister-WordCount-0.9.jar', 

'

localPathOfProgramInput':'/root/

salsaDPI/input/twisterWordCountInput.tar.gz', 

'localPathOfBinaryDependency':'', 

'

programExecuteLocation

':'

samples/

wordcount

/bin

',

'

twisterInputFilesPreFix

':'

wc_data

', 

'

programArgs

':'./

run_wc.sh #_TWISTER_PARTITION_FILE_# #_

TWISTER_OUTPUTDIR_#/wc.out 4 1

'

}

 Slide24

Detail description could be see here:http://salsahpc.indiana.edu/ScienceCloud/handson2_chef_cloud.html

eucaInfo

A json object that contains cloud mode Eucalyptus related information, 'eucarcFilePath', 'eucaImageEmi', 'eucaSSHPublicKey', 'eucaVmType', and 'amountOfInstances'

eucarcFilePath

Full path to downloaed eucarc file

eucaImageEmi

Eucalyptus VM image registered on FutureGrid, e.g. emi-52C93AC2

eucaSSHPublicKey

Eucalyptus public key name (which you setup during the FutureGrid Eucalyptus setting)

eucaVmType

Eucalypus VM type, e.g. c1.medium

amountOfInstances

Amount of instances for this job, e.g. 2ssh

A json object that contains ssh information, SSHLoginUsername and SSHPrivateKeyPathSSHLoginUsername

Ssh login username, for cloud mode, it must be root.SSHPrivateKeyPath

Full path to

ssh

private key which uses to login to VM.Slide25

Step 3. Execute SalsaDPI with Conf.

Execute

SalsaDPI

with command:

$ cd ~/

salsaDPI

$ java

-cp salsaDPI.jar cgl.salsa.salsadpi.Driver

<path_to_conf_file>The output will be stored at <

workingDir>/salsaDPI_output

/<job_uuid>/output/*.Slide26

Cloud Twister WordCount

{ // Useful general variables of

programArgs

for

applicationParameters

object

// #_JAR_#, #_JOB_ID_#, 

// #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#,

// #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_#

// 'mode':'sandbox', | 'mode':'cloud

','mode':'cloud

',

// euca cloud parameters

'eucaInfo':{'eucarcFilePath

':'/root/eucarc',

'eucaImageEmi

':'

emi-A8F63C29

',

'

eucaSSHPublicKey

':'

stephen

',

'eucaVmType

':'

m1.small

',

'amountOfInstances':2},

// ssh

passwordless related parameters'

ssh':{'SSHLoginUsername':'root', 

'

SSHPrivateKeyPath

':'

/

root/

stephen.pem

' }, 

// runtime

softwares

such as recipe[

hadoopSandbox

], recipe[

twisterSandbox

],

// recipe[

hadoopCloud

], and recipe[

twisterCloud

]

'

softwareRecipes

':['

recipe[

twisterCloud]'], 

// user-defined application parameters'

applicationParameters':{ 'applicationType':'

Twister', 'localPathOfProgramBinary':'/root/

salsaDPI/apps/Twister-WordCount-0.9.jar

',  'localPathOfProgramInput':'/root/salsaDPI

/input/twisterWordCountInput.tar.gz', 

'localPathOfBinaryDependency

':'',  '

programExecuteLocation':'samples/

wordcount/bin',

'twisterInputFilesPreFix

':'wc_data', 

'programArgs':'

./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out

4 1'} }Slide27

DemoDemo videoVideo Hands-on 2

Cloud Twister

WordCount

YouTube link (1080P)Slide28

'

applicationParameters

':{

'

applicationType

':'

Twister', 'localPathOfProgramBinary

':'#_FullPath_To_TwisterKmeans_JAR

_#', 

'localPathOfProgramInput':'

#_FullPath_To_TwisterKmeans_Inputs_GZ_File

_#',  '

localPathOfBinaryDependency':'#_

FullPath_To_TwisterKmeans_InitClusterFile_#', 

'

programExecuteLocation

':'

samples/

kmeans

/bin

',

'

twisterInputFilesPreFix

':'

km_data

', 

'

programArgs':'

./run_kmeans.sh #_BINARY_DEPENDENCY_# 80 #_TWISTER_PARTITION_FILE_# > #_TWISTER_OUTPUTDIR_#/#_JOB_ID_#.txt'

} Twister Kmeans

Modify Sandbox/

C

loud conf. file for Twister

Kmeans

.

Below are hints for Twister

Kmeans

conf. file.Slide29

Hadoop Kmeans

'

applicationParameters

':

{

'

applicationType':'Hadoop

', 'localPathOfProgramBinary

':'#_Path_HadoopKmeans_Jar

_#',

'localPathOfProgramInput

':'', 'localPathOfProgramDB

':'', 'localPathOfBinaryDependency

':'', 'programExecuteLocation

':'',

'

programArgs

':'

bin/

hadoop

jar #_JAR_# 500 10 8 3 #_JOB_ID_# > ~/#_JOB_ID_#/#_JOB_ID_#.

txt

'

}

M

odify a

S

andbox/Cloud conf. file for

Hadoop

Kmeans

.

Below snapshot provides hints for

Kmeans

programArgs

. Slide30

Thank youSlide31

Cloud Hadoop WordCount

{ // mode = 'cloud'

'

mode':'cloud

',

//

euca

cloud parameters 'eucaInfo

':{'eucarcFilePath':'/root/eucarc',

'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey

':'stephen', // replace stephen

to your pub key name 'eucaVmType':'m1.small', 'amountOfInstances':2},

'ssh

':{'SSHLoginUsername':'root', '

SSHPrivateKeyPath':'/root/stephen.pem'},

// replace

stephen.pem

to your private key

'

softwareRecipes

':['recipe[

hadoopCloud

]'],

'

applicationParameters

':{

'

applicationType':'Hadoop',

'localPathOfProgramBinary':'/root/

salsaDPI/apps/hadoopWordCount.jar', '

localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt',

'

localPathOfProgramDB

':'',

'

programExecuteLocation

':'',

'

programArgs

':'bin/

hadoop

jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'}

}Slide32

DemoCloud

Hadoop

WordCount

http://salsahpc.indiana.edu/ScienceCloud/video/salsaDPI/cloudHadoopWordCount.wmvSlide33

Sandbox Twister WordCount

{ // mode = 'sandbox'

'

mode':'sandbox

',

// chef solo parameters

'chef':{'

chefSoloRecipeUrls':'http://129.79.49.248/chef-solo.tar.gz', '

chefSoloConfFilePath':'/root/solo.rb'},

'ssh':{'SSHLoginUsername

':'root', 'SSHPrivateKeyPath

':'/root/.ssh/id_rsa

'}, 'softwareRecipes

':['recipe[twisterSandbox]'],

'applicationParameters':{

'

applicationType

':'Twister',

'

localPathOfProgramBinary

':'/root/

salsaDPI

/apps/Twister-WordCount-0.9.jar',

'

localPathOfProgramInput

':'/root/salsaDPI/input/twisterWordCountInput.tar.gz',

'localPathOfBinaryDependency':'',

'localPathOfProgramDB':'',

'programExecuteLocation':'samples/wordcount

/bin', 'twisterInputFilesPreFix

':'

wc_data

',

'

programArgs

':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/

wc.out

4 1'}

}Slide34

DemoSandbox Twister

WordCount

http://salsahpc.indiana.edu/ScienceCloud/video/salsaDPI/sandBoxTwisterWordCount.wmv