Applications Lab session Tak Lon Stephen Wu Overview Introduction VirtualBox Prepackaged Image Example 1 Sandbox Hadoop WordCount Example 2 Cloud Twister WordCount Exercises ID: 148527
Download Presentation The PPT/PDF document "Reproducible Environment for Scientific" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Reproducible Environment for Scientific Applications (Lab session)
Tak
-Lon (Stephen
) WuSlide2
OverviewIntroductionVirtualBox Prepackaged Image
Example 1: Sandbox
Hadoop
WordCount
Example 2: Cloud
Twister
WordCount
Exercises
Sandbox
Hadoop
/Twister
Kmeans
Cloud
Hadoop
/Twister
KmeansSlide3
Motivations
Background knowledge
Environment setting
Different cloud infrastructure
tools
Software dependencies
Long learning path
Automatic these complicated steps?
Solution: Salsa Dynamic Provisioning Infrastructure (
SalsaDPI
).
batch-like programSlide4
What is SalsaDPI? (Sandbox)
OS
chef-solo
SalsaDPI
Jar
S/W
Applications
1. Read a Conf. file and execute software run-list
2
. Install software
3
. Run apps
User Configuration Slide5
OS
Chef
Apps
S/W
VM
OS
Chef
Apps
S/W
VM
OS
Chef
Apps
S/W
VM
OS
Chef Client
SalsaDPI
Jar
Chef Server
Bootstrap VMs
with a conf. file
4
. VM(s) Information
2. Retrieve conf. Info. and request Authentication
and
Authorization
3
. Authenticated and Authorized to execute software run-list
5
. Submit application
commands
6. Obtain Result
What is
SalsaDPI
? (Cloud)
* Chef architecture
http://wiki.opscode.com/display/chef/Architecture+Introduction
User
Conf.Slide6
What is SalsaDPI? (Cont.)
Chef features
O
n-demand install software when starting VMs
Monitor software installation progress
Easy to use
SalsaDPI featuresProvide configurable interfaceAutomate Hadoop/Twister/other binary execution
*Chef Official website: http://www.opscode.com/chef/Slide7
Hands-on SessionSlide8
Online Tutorial pagehttp://salsahpc.indiana.edu/ScienceCloud/reproduce-intro.htmlSlide9
PrerequisitesInstall
VirtualBox
on your laptop, download and import a prepackaged image
Setup
FutureGrid
Eucalyptus environmentMake sure you setup the shared folder between host and guest machine correctly# login to
FutureGrid India Headnode i136 $ ssh -
i ~/fg_private_key.pem
johnny@india.futuregrid.org
~/fg_private_key.pem have to be replaced to your own private key file nameSlide10
About Pre-packaged Image
It has the following software installed and configured under /root/software/:
Java JDK
Chef
Hadoop
Twister and
ActiveMQHbasePigsalsaDPI (/root/salsaDPI/)Slide11
Important NotesIf you have
activemq.log
and
kahadb
in
directory
/root/software/apache-activemq-5.4.2/, please remove them. Otherwise, it will cause errors when running sandbox Twister applications.$ cd
/root/software/apache-activemq-5.4.2/$ ls
activemq.log kahadb
$ rm
-rf activemq.log
kahadbSlide12
ExamplesExample 1: Sandbox Hadoop
WordCount
Example 2: Cloud Twister
WordCount
Goals
Learn and modify SalsaDPI json configuration file Execute SalsaDPI java executable with passing the configuration file
http://salsahpc.indiana.edu/ScienceCloud/handson1_chef_sandbox.html* Json
metadata format example : http://json.org/example.htmlSlide13
Step 1. Open the Conf. File
Locate and open the configuration file.
/root/
salsaDPI
/sandbox/templates/
sandbox_hadoopTemplate.json
/root/salsaDPI/sandbox/templates/
sandbox_twisterTemplate.json Slide14
Step 2. Modify Conf. File
'
applicationParameters
':
{
'
applicationType':'Hadoop',
'
localPathOfProgramBinary':'/root/
salsaDPI/apps/hadoopWordCount.jar',
'
localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt
',
'
localPathOfBinaryDependency
':'',
'
programExecuteLocation
':'',
'
programArgs
':'
bin/
hadoop
jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'
} Slide15
Detail description could be see here:http://salsahpc.indiana.edu/ScienceCloud/handson1_chef_sandbox.html
applicationParameters
A
json
object that contains user-defined application's information
applicationType
Type of user-defined application, options: Hadoop or Twister
localPathOfProgramBinary
Full path of user-defined Hadoop or Twister compiled jar executable on the working machine
localPathOfProgramInput
Full path of user-defined input file on the working machine, normally, a plaintext or a *.tar.gz filelocalPathOfBinaryDependency
Full path of user-defined program dependency file on the working machine, such as Twister Kmeans initial cluster fileprogramExecuteLocation
Path to Twister program execution script refer to Twister package, such as samples/wordcount/bin or samples/kmeans/bin
twisterInputFilesPreFixTwister Input files prefix. Refer to the provided package, for Twister WordCount, the file prefixed is wc_data, for Twister Kmeans is km_data.
programArgsUser-defined program execution commandSlide16
Sandbox Hadoop WordCount
{ // Useful general variables of
programArgs
for
applicationParameters
object
// #_JAR_#, #_JOB_ID_#,
// #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#,
// #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_#// '
mode':'sandbox', | 'mode':'cloud',
'mode':'sandbox',
// chef-solo related parameters'chef':{'
chefSoloRecipeUrls':'http://129.79.49.248/chef-solo.tar.gz', '
chefSoloConfFilePath':'/root/salsaDPI/
solo.rb'},
//
ssh
passwordless
related parameters
'
ssh
':{'
SSHLoginUsername
':'root',
'
SSHPrivateKeyPath
':'/root/.ssh/id_rsa
' }, // runtime
softwares such as recipe[hadoopSandbox] or recipe[
twisterSandbox]'softwareRecipes
':['recipe[hadoopSandbox]
'], // please don't change this line
// user-defined application
parameters
'
applicationParameters
':{'
applicationType
':'
Hadoop
',
'
localPathOfProgramBinary
':'
/root/
salsaDPI
/apps/hadoopWordCount.jar
',
'
localPathOfProgramInput
':'
/root/
salsaDPI
/input/hadoopWordCountInput.txt',
'localPathOfBinaryDependency':'',
'programExecuteLocation':'', 'programArgs':'
bin/hadoop jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'} }Slide17
Step 3. Execute SalsaDPI with Conf.
Execute
SalsaDPI
with command:
$ cd ~/
salsaDPI
$ java
-cp salsaDPI.jar cgl.salsa.salsadpi.Driver
<path_to_conf_file>The output will be stored at
<workingDir>/salsaDPI_output
/<job_uuid>/output/*.Slide18
DemoDemo videoVideo hands-on 1
Sandbox
Hadoop
WordCount
YouTube link (1080p)Slide19
ExamplesExample 1: Sandbox Hadoop
WordCount
Example 2: Cloud Twister
WordCount
Goals
Make sure FutureGrid Eucalyptus setup and download required files correctlyLearn and modify SalsaDPI
json configuration fileExecute SalsaDPI java executable with passing the configuration filehttp://salsahpc.indiana.edu/ScienceCloud/handson2_chef_cloud.html
For live testing, please make sure your name is here* Json
metadata format example : http://json.org/example.htmlSlide20
Step 1. Open the Conf. File
Locate and open the configuration file.
/root/
salsaDPI
/cloud/templates/
cloud
_hadoopTemplate.json
/root/salsaDPI/cloud/templates/
cloud_twisterTemplate.json Slide21
Step 2. Modify Conf. File
'
eucaInfo
':{
'
eucarcFilePath
':'
#_FullPath_to_eucarc_File
_#',
'eucaImageEmi':'emi-A8F63C29',
'eucaSSHPublicKey
':'#_Euca_Keypair_PublicKeyName
_#',
'eucaVmType':'m1.small',
'amountOfInstances':2},
Slide22
Step 2. Modify Conf. File (Cont.)
'
ssh
': {
'
SSHLoginUsername
':'
root',
'SSHPrivateKeyPath':'/root/#_
yourPrivatekey_FileName_#'
}, Slide23
Step 2. Modify Conf. File (Cont.)
'
applicationParameters
':
{
'
applicationType':
'Twister',
'localPathOfProgramBinary':'/root/
salsaDPI/apps/Twister-WordCount-0.9.jar',
'
localPathOfProgramInput':'/root/
salsaDPI/input/twisterWordCountInput.tar.gz',
'localPathOfBinaryDependency':'',
'
programExecuteLocation
':'
samples/
wordcount
/bin
',
'
twisterInputFilesPreFix
':'
wc_data
',
'
programArgs
':'./
run_wc.sh #_TWISTER_PARTITION_FILE_# #_
TWISTER_OUTPUTDIR_#/wc.out 4 1
'
}
Slide24
Detail description could be see here:http://salsahpc.indiana.edu/ScienceCloud/handson2_chef_cloud.html
eucaInfo
A json object that contains cloud mode Eucalyptus related information, 'eucarcFilePath', 'eucaImageEmi', 'eucaSSHPublicKey', 'eucaVmType', and 'amountOfInstances'
eucarcFilePath
Full path to downloaed eucarc file
eucaImageEmi
Eucalyptus VM image registered on FutureGrid, e.g. emi-52C93AC2
eucaSSHPublicKey
Eucalyptus public key name (which you setup during the FutureGrid Eucalyptus setting)
eucaVmType
Eucalypus VM type, e.g. c1.medium
amountOfInstances
Amount of instances for this job, e.g. 2ssh
A json object that contains ssh information, SSHLoginUsername and SSHPrivateKeyPathSSHLoginUsername
Ssh login username, for cloud mode, it must be root.SSHPrivateKeyPath
Full path to
ssh
private key which uses to login to VM.Slide25
Step 3. Execute SalsaDPI with Conf.
Execute
SalsaDPI
with command:
$ cd ~/
salsaDPI
$ java
-cp salsaDPI.jar cgl.salsa.salsadpi.Driver
<path_to_conf_file>The output will be stored at <
workingDir>/salsaDPI_output
/<job_uuid>/output/*.Slide26
Cloud Twister WordCount
{ // Useful general variables of
programArgs
for
applicationParameters
object
// #_JAR_#, #_JOB_ID_#,
// #_HDFS_INPUTDIR_#, #_HDFS_OUTPUTDIR_#,
// #_TWISTER_INPUTDIR_#, #_TWISTER_OUTPUTDIR_#, #_TWISTER_PARTITION_FILE_#, #_BINARY_DEPENDENCY_#
// 'mode':'sandbox', | 'mode':'cloud
','mode':'cloud
',
// euca cloud parameters
'eucaInfo':{'eucarcFilePath
':'/root/eucarc',
'eucaImageEmi
':'
emi-A8F63C29
',
'
eucaSSHPublicKey
':'
stephen
',
'eucaVmType
':'
m1.small
',
'amountOfInstances':2},
// ssh
passwordless related parameters'
ssh':{'SSHLoginUsername':'root',
'
SSHPrivateKeyPath
':'
/
root/
stephen.pem
' },
// runtime
softwares
such as recipe[
hadoopSandbox
], recipe[
twisterSandbox
],
// recipe[
hadoopCloud
], and recipe[
twisterCloud
]
'
softwareRecipes
':['
recipe[
twisterCloud]'],
// user-defined application parameters'
applicationParameters':{ 'applicationType':'
Twister', 'localPathOfProgramBinary':'/root/
salsaDPI/apps/Twister-WordCount-0.9.jar
', 'localPathOfProgramInput':'/root/salsaDPI
/input/twisterWordCountInput.tar.gz',
'localPathOfBinaryDependency
':'', '
programExecuteLocation':'samples/
wordcount/bin',
'twisterInputFilesPreFix
':'wc_data',
'programArgs':'
./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/wc.out
4 1'} }Slide27
DemoDemo videoVideo Hands-on 2
Cloud Twister
WordCount
YouTube link (1080P)Slide28
'
applicationParameters
':{
'
applicationType
':'
Twister', 'localPathOfProgramBinary
':'#_FullPath_To_TwisterKmeans_JAR
_#',
'localPathOfProgramInput':'
#_FullPath_To_TwisterKmeans_Inputs_GZ_File
_#', '
localPathOfBinaryDependency':'#_
FullPath_To_TwisterKmeans_InitClusterFile_#',
'
programExecuteLocation
':'
samples/
kmeans
/bin
',
'
twisterInputFilesPreFix
':'
km_data
',
'
programArgs':'
./run_kmeans.sh #_BINARY_DEPENDENCY_# 80 #_TWISTER_PARTITION_FILE_# > #_TWISTER_OUTPUTDIR_#/#_JOB_ID_#.txt'
} Twister Kmeans
Modify Sandbox/
C
loud conf. file for Twister
Kmeans
.
Below are hints for Twister
Kmeans
conf. file.Slide29
Hadoop Kmeans
'
applicationParameters
':
{
'
applicationType':'Hadoop
', 'localPathOfProgramBinary
':'#_Path_HadoopKmeans_Jar
_#',
'localPathOfProgramInput
':'', 'localPathOfProgramDB
':'', 'localPathOfBinaryDependency
':'', 'programExecuteLocation
':'',
'
programArgs
':'
bin/
hadoop
jar #_JAR_# 500 10 8 3 #_JOB_ID_# > ~/#_JOB_ID_#/#_JOB_ID_#.
txt
'
}
M
odify a
S
andbox/Cloud conf. file for
Hadoop
Kmeans
.
Below snapshot provides hints for
Kmeans
’
programArgs
. Slide30
Thank youSlide31
Cloud Hadoop WordCount
{ // mode = 'cloud'
'
mode':'cloud
',
//
euca
cloud parameters 'eucaInfo
':{'eucarcFilePath':'/root/eucarc',
'eucaImageEmi':'emi-A8F63C29', 'eucaSSHPublicKey
':'stephen', // replace stephen
to your pub key name 'eucaVmType':'m1.small', 'amountOfInstances':2},
'ssh
':{'SSHLoginUsername':'root', '
SSHPrivateKeyPath':'/root/stephen.pem'},
// replace
stephen.pem
to your private key
'
softwareRecipes
':['recipe[
hadoopCloud
]'],
'
applicationParameters
':{
'
applicationType':'Hadoop',
'localPathOfProgramBinary':'/root/
salsaDPI/apps/hadoopWordCount.jar', '
localPathOfProgramInput':'/root/salsaDPI/input/hadoopWordCountInput.txt',
'
localPathOfProgramDB
':'',
'
programExecuteLocation
':'',
'
programArgs
':'bin/
hadoop
jar #_JAR_# #_HDFS_INPUTDIR_# #_HDFS_OUTPUTDIR_#'}
}Slide32
DemoCloud
Hadoop
WordCount
http://salsahpc.indiana.edu/ScienceCloud/video/salsaDPI/cloudHadoopWordCount.wmvSlide33
Sandbox Twister WordCount
{ // mode = 'sandbox'
'
mode':'sandbox
',
// chef solo parameters
'chef':{'
chefSoloRecipeUrls':'http://129.79.49.248/chef-solo.tar.gz', '
chefSoloConfFilePath':'/root/solo.rb'},
'ssh':{'SSHLoginUsername
':'root', 'SSHPrivateKeyPath
':'/root/.ssh/id_rsa
'}, 'softwareRecipes
':['recipe[twisterSandbox]'],
'applicationParameters':{
'
applicationType
':'Twister',
'
localPathOfProgramBinary
':'/root/
salsaDPI
/apps/Twister-WordCount-0.9.jar',
'
localPathOfProgramInput
':'/root/salsaDPI/input/twisterWordCountInput.tar.gz',
'localPathOfBinaryDependency':'',
'localPathOfProgramDB':'',
'programExecuteLocation':'samples/wordcount
/bin', 'twisterInputFilesPreFix
':'
wc_data
',
'
programArgs
':'./run_wc.sh #_TWISTER_PARTITION_FILE_# #_TWISTER_OUTPUTDIR_#/
wc.out
4 1'}
}Slide34
DemoSandbox Twister
WordCount
http://salsahpc.indiana.edu/ScienceCloud/video/salsaDPI/sandBoxTwisterWordCount.wmv