/
Intermediate SCC Usage  Research Computing Services Intermediate SCC Usage  Research Computing Services

Intermediate SCC Usage Research Computing Services - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
381 views
Uploaded On 2018-02-27

Intermediate SCC Usage Research Computing Services - PPT Presentation

Katia Oleinik koleinikbuedu Shared Computing Cluster Shared transparent multiuser and multitasking environment Computing heterogeneous environment interactive jobs single processor and parallel jobs ID: 638075

scc job scc1 jobs job scc jobs scc1 time nodes cores node request memory list running project number batch

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Intermediate SCC Usage Research Computi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Intermediate SCC Usage

Research Computing Services

Katia Oleinik (koleinik@bu.edu)Slide2

Shared Computing Cluster

Shared -

transparent multi-user and multi-tasking environmentComputing -

heterogeneous environment:

interactive jobs

single processor and parallel jobsgraphics jobCluster - a set of connected via a fast local area network computers; job scheduler coordinates work loads on each nodeSlide3

Shared Computing Cluster

Rear View

Compute Nodes

Infiniband

EthernetSlide4

SCC resources

Processors

: Intel and AMD CPU Architecture

:

nehalem

, sandybridge, ivybridge, bulldozer, haswell

,

broadwell

Ethernet connection

:

1 or 10

Gbps

Infiniband

:

E

DR

, FDR, QDR ( or none )

GPUs: NVIDIA Tesla P100, K40m, M2070 and M2050

Number of cores

:

8, 12, 16, 20, 28, 36, 64Memory (RAM): 24GB – 1TBScratch Disk: 244GB – 886GBTechnical Summary:http://www.bu.edu/tech/support/research/computing-resources/tech-summary/Slide5

SCC organization

Around 900 nodes with

~

12,000

CPUs and ~200 GPUs

File StorageLogin nodes

Compute nodes

Private Network

Public Network

SCC1

SCC2

GEO

SCC4

~3.4PB of StorageSlide6

SCC General limits

All login nodes are limited to

15min. of CPU timeDefault wall clock time limit for all jobs

12 hoursMaximum number of processors – 1000Slide7

SCC General limits

1 processor job (batch or interactive)

– 720 hours

omp

job (16 processors or less)

– 720 hoursmpi job (multi-node job) – 120 hoursgpu job –

48 hours

Interactive Graphics job (virtual GL)

48 hoursSlide8

SCC Login nodes

Login nodes are designed for light work:

- text editing

- light debugging

- program compilation - file transferSlide9

Service Models - shared and buy-in

Shared:

paid for by BU and university-wide grants and are free to the entire BU Research Computing community.

Buy-In:

purchased by individual faculty or research groups through the Buy-In program with priority access for the purchaser.

~

55

~

45Slide10

SCC Compute Nodes

Buy-in nodes:

All buy-in nodes have a hard limit of 12 hours for non-member jobs. The time limit for group member jobs is set by the PI of the group;

Currently, more than

50

% of all nodes are buy-in nodes. Setting time limit for a job larger than 12 hours automatically excludes all buy-in nodes from the available resources; All nodes in a buy-in queue do not accept new non-member jobs if a project member submitted a job or running a job anywhere on the cluster. Slide11

SCC: running jobs

Types of jobs:

Interactive job – running interactive shell: run GUI applications, code debugging, benchmarking of serial and parallel code performance;

Interactive Graphics job

( for running interactive software with advanced graphics ) .

Batch job – execution of the program without manual intervention;Slide12

SCC: interactive jobs

 

qsh

qlogin / qrsh

X-forwarding is required

Session is opened in a separate window

Allows for a graphics window to be opened by a program

Current environment variables can be passed to the session

Batch-system environment variables ($NSLOTS, etc.) are set

—Slide13

SCC: running interactive jobs (

qrsh)

"

qrsh

" - Request from the queue (

q) a remote (r) shell (sh)

[koleinik@

scc2

~]$

qrsh

-P

myproject

[koleinik@

scc-pi4

~]$

Interactive shell

GUI applications

code debugging

benchmarkingSlide14

SCC: running interactive jobs

Request appropriate resources for the interactive job

:

- Some software (like MATLAB, STATA-MP) might use multiple cores.

- Make sure to request enough resources if the program needs more than 8GB of memory or longer than 12 hours;Slide15

SCC: interactive graphics jobs (

qvgl)

The majority of graphical applications perform well using VNC.

Required for those applications that use OpenGL for 3D hardware acceleration

fMRI and similar Applications (freesurfer,

freeview

, SPM, MNE, ...),

molecular modeling (

gview

, VMD,

Pymol

, maestro, ...)

This job type combines dedicated GPU resources with VNCSlide16

SCC: submitting batch jobs

Using

-b y

option:

scc1 %

qsub

-b y

cal

-y

Using script:

scc1 %

qsub

<

script_name

>Slide17

SCC: batch jobs

Script organization:

#!/bin/bash

-l

#Time limit

#$ -l

h_rt

=12:00:00

#Project name

#$ -P

krcs

#Send email-report at the end of the job

#$ -m e

#Job name

#$

-N

myjob

#Load modules:

module load R/R-3.2.3

#Run the program

R

script

my_R_program.R

Script interpreter

Scheduler Directives

Commands to execute

Execute login shell

(for proper interpretation of the module commands)Slide18

SCC: requesting resources (job options)

General Directives

Directive

Description

-l h_rt

=

hh:mm:ss

Hard run time limit in 

hh:mm:ss

 format. The default is 12 hours.

-P

 

project_name

Project to which this jobs is to be assigned. This directive is 

mandatory

 for all users associated with any Med.Campus project.

-N

 

job_name

Specifies the job name. The default is the script or command name.

-o

 

outputfile

File name for the stdout output of the job.

-e

 

errfile

File name for the stderr output of the job.

-j y

Merge the error and output stream files into a single file.

-m

 

b|e|a|s|n

Controls when the batch system sends email to you. The possible values are – when the job begins (b), ends (e), is aborted (a), is suspended (s), or never (n) – default.

-M

 

user_email

Overwrites the default email address used to send the job report.

-V

All current environment variables should be exported to the batch job.

-v

 

env=value

Set the runtime environment variable 

env

 to 

value

.

-hold_jid

 

job_list

Setup job dependency list. 

job_list

 is a comma separated list of job ids and/or job names which must complete before this job can run. See 

Advanced Batch System Usage

 for more information.Slide19

SCC: requesting resources (job options)

Directives to request SCC resources

Directive

Description

-l h_rt

=

hh:mm:ss

Hard run time limit in 

hh:mm:ss

 format. The default is 12 hours.

-l

mem_total

 =

#G

Request a node that has at least this amount of memory. Current possible choices include 94G, 125G, 252G ( 504G – for Med. Campus users only).

-l

mem_per_core

 =

#G

Request a node that has at least these amount of memory per core.

-l

cpu_arch

=

ARCH

Select a processor architecture (sandybridge, nehalem). See 

Technical Summary

 for all available choices.

-l cpu_type

=

TYPE

Select a processor type (E5-2670, E5-2680, X5570, X5650, X5670, X5675). See

Technical Summary

 for all available choices.

-l

gpus

=

G/C

Requests a node with GPU. 

G/C

 specifies the number of GPUs per each CPU requested and should be expressed as a decimal number. See 

Advanced Batch System Usage

 for more information.

-l

gpu_type

=

GPUMODEL

Current choices for 

GPUMODEL

 are M2050, M2070 and K40m.

-pe omp

 

N

Request multiple slots for Shared Memory applications (

OpenMP

,

pthread

). This option can also be used to reserve larger amount of memory for the application. 

N

 can vary from 1 to 16.

-pe mpi_#_tasks_per_node

 

N

Select multiple nodes for MPI job. Number of tasks can be 4, 8, 12 or 16 and 

N

 must be a multiple of this value. See 

Advanced Batch System Usage

 for more information.Slide20

SCC: requesting resources (job options)

Directives to request SCC

resources (continuation)

Directive

Description

-l

eth_speed

=

1

Ethernet speed (1 or

10

Gbps

)

.

-l

mem_free

 =

#G

Request a node that has at least this amount of

free memory

.

Note

that the amount of free memory changes!

-l

scratch_free

 =

#G

Request a node that has at least this amount of available disc space in scratch.

List various resources that can be requested

scc1 %

qconf

-

sc

scc1 %

man

qstatSlide21

SCC: tracking the jobs

Checking the status of a batch job

scc1 %

qstat

-u <

userID

>

List only running jobs

scc1 %

qstat

–u <

userID

> -s r

Get job information:

scc1 %

qsub

-j <

jobID

>

Display resources requested

by a job

scc1 %

qstat

–u

<

userID

>

-rSlide22

SCC: tracking the jobs

scc1 %

qstat

-j 596557

job ID

job_number

: 596557

exec_file

:

job_scripts

/596557

submission_time

: Mon Sep 11 10:11:04 2017

owner:

koleinik

sge_o_home: /usr1/scv/koleinik

sge_o_log_name: koleiniksge_o_path

: /usr/java/default/jre/bin:/usr

/java/default/bin:/

usr

/lib64/...sge_o_shell: /bin/bashsge_o_workdir: /projectnb/krcs/projects/sge_o_host: scc4account: sgecwd: /projectnb/krcs

/projects/

chamongrp

merge: y

hard

resource_list

:

no_gpu

=

TRUE,h_rt

=172800

soft

resource_list

:

buyin

=TRUE

mail_options

: ae

mail_list

: koleinik@scc4.bu.edu

notify: FALSE

job_name

: sim

jobshare

: 0

env_list

: PATH=/

usr

/java/default/

jre

/bin:/

usr

/java/default/bin

script_file

:

job.qsub

parallel environment: omp16 range: 16

project

:

krcsusage 1: cpu=00:13:38, mem=813.90147 GBs, io=0.01024, vmem=1.013G, maxvmem=1.013Gscheduling info: (Collecting of scheduler job information is turned off)Slide23

SCC: tracking the jobs

1. Login to the compute node

scc1 %

ssh

scc-ca1

2. Run

top

command

scc1 %

top

-u <

userID

>

Top command will give you a listing of the processes running as well as memory an CPU usage

3

. Exit from the compute node

scc1 %

exitSlide24

SCC: completed jobs report (

qacct)

qacct - query the accounting system

scc1 %

qacct

-

j

596557

query the job by ID

scc1 %

qacct

-

j

-d 3 -o

koleinik

query the job by the time of execution

number of days

job ownerSlide25

SCC: completed jobs report (

qacct)

qacct - query the accounting system

scc1 %

qacct

-

j

596557

query the job by ID

scc1 %

qacct

-

j

-d 3 -o

koleinik

query the job by the time of execution

number of days

job ownerSlide26

SCC: completed jobs report (

qacct)

qname

p100

hostname scc-c11.scc.bu.edu

group scvowner koleinikproject krcsjobname myjobjobnumber

551947

qsub_time

Wed Sep 6 20:08:56 2017

start_time

Wed Sep 6 20:09:37 2017

end_time

Wed Sep 6 23:32:29 2017granted_pe NONEslots 1

failed 0exit_status

0cpu 11232.780

mem 611514.460io 14.138iow 0.000

maxvmem 71.494Garid undefinedSlide27

SCC: node architecture

Login nodes:

Broadwell

architecture; 28 cores

Many compute nodes have

older architecture.As a result, the programs compiled with Intel and PGI compilers with some optimization options on a login node might fail when run on a compute node with a different architecture

http://www.bu.edu/tech/support/research/software-and-programming/programming/compilers/intel-compiler-flags

/

http://www.bu.edu/tech/support/research/software-and-programming/programming/compilers/pgi-compiler-flags

/Slide28

My job failed… WHY?Slide29

SCC: job analysis

If the job ran with "-m e" flag, an email will be sent at the end of the job

:

Job 7883980 (

smooth_spline

) Complete User = koleinik

Queue =

p-int@scc-pi2.scc.bu.edu

Host = scc-pi2.scc.bu.edu

Start Time = 08/29/2015 13:18:02

End Time = 08/29/2015 13:58:59

User Time = 01:05:07

System Time = 00:03:24

Wallclock

Time = 00:40:57

CPU = 01:08:31

Max

vmem

= 6.692G

Exit Status = 0Slide30

SCC: job analysis

The default time for interactive and non-interactive jobs on the SCC is

12 hours.Make sure you request enough time for your application to complete:

Job 9022506 (

myJob

) AbortedExit Status = 137

Signal = KILL

User =

koleinik

Queue = b@scc-bc3.scc.bu.edu

Host = scc-bc3.scc.bu.edu

Start Time = 08/18/2014 15:58:55

End Time = 08/19/2014 03:58:56

CPU = 11:58:33

Max

vmem

= 4.324G

failed assumedly after job because:

job 9022506.1 died through signal KILL (9)Slide31

SCC: job analysis

The memory (RAM) varies from node to node (some nodes have only 3GB of memory per slot, while others up to

28

GB

) . It is important to know how much memory the program needs and request appropriate resources.

Job 1864070 (myBigJob

) Complete

 User  =

koleinik

 Queue  = linga@scc-kb8.scc.bu.edu

 Host  = scc-kb8.scc.bu.edu

 Start Time  = 10/19/2014 15:17:22

 End Time   = 10/19/2014 15:46:14

 User Time  = 00:14:51

 System Time  = 00:06:59

 

Wallclock

Time  = 00:28:52

 CPU  = 00:27:43

 

Max vmem   = 207.393G

 Exit Status  = 137

Show RAM of a node

scc1 %

qhost

-h scc-kb8Slide32

SCC: job analysis

Currently, on the SCC there are nodes with:

16 cores & 128GB = 8GB/per

slot 20 cores & 128GB

~ 6GB/per

slot

16 cores & 256GB = 16GB/per

slot 20 cores &

256GB ~ 12GB/per

slot

12 cores & 48GB = 4GB/per

slot

28

cores &

256GB

~

9GB/per

slot8 cores & 24GB = 3GB/per slot 28 cores & 512GB ~ 18GB/per slot8 cores & 96GB = 12GB/per slot

36

cores & 1TB ~ 28GB/per

slot

64 cores & 256GB = 4GB/per slot

64 cores & 512GB = 8GB/per slot

Available only to Med. Campus usersSlide33

SCC: job analysis

Example:

Single processor job needs

20GB

of memory.

-----------------------------------------------------------

# Request a node with enough memory per core

#$ -l

mem_per_core

=8G

# Request enough slots

#$ -

pe

omp

3

http://www.bu.edu/tech/support/research/system-usage/running-jobs/batch-script-examples/#MEMORYSlide34

SCC: job analysis

Example:

Single processor job needs

200 GB

of memory.

-----------------------------------------------------------

# Request a node with enough memory per core

#$ -l

mem_per_core

=16G

# Request enough slots

#$ -

pe

omp

16

http://www.bu.edu/tech/support/research/system-usage/running-jobs/batch-script-examples/#LARGEMEMORY

Slide35

SCC: job analysis

Job 1864070 (

myParJob

) Complete

 User  =

koleinik

 Queue  = budge@scc-hb2.scc.bu.edu

 Host  = scc-hb2.scc.bu.edu

 Start Time = 11/29/2014 00:48:27

End Time = 11/29/2014 01:33:35

User Time = 02:24:13

System Time = 00:09:07

Wallclock

Time = 00:45:08

CPU = 02:38:59

Max

vmem

= 78.527G

 Exit Status = 137

Some applications try to detect the number of cores and parallelize if possible.

One common example is MATLAB.

Always read documentation and available options to applications. And either disable parallelization or request additional cores.

If the program does not allow to control the number of cores used – request the whole node.Slide36

SCC: job analysis

Example:

MATLAB by default will use all available cores.

-----------------------------------------------------------

# Start MATLAB using a single thread option:

matlab

-

nodisplay

-

singleCompThread

-r "n=4, rand(n), exit"

Slide37

SCC: job analysis

Example:

Running MATLAB Parallel Computing Toolbox.

-----------------------------------------------------------

# Request 4 cores:

#$ -

pe

omp

4

matlab

-

nodisplay

-r "

matlabpool

open 4, s=0;

parfor

i=1:n, s=

s+i

; end, matlabpool close, s, exit" Slide38

SCC: job analysis

The information about past job can be retrieved using

qacct

command

:

scc1 %

qacct

-o <

userID

> -d <number of days> -j

scc1 %

qacct

-j <

jobID

>

Information about a particular job:

Information about all the jobs that ran in the past 3 days:Slide39

SCC: quota and project quotas

My job used to run fine and now it fails… Why?

scc1 %

pquota

-u <project name>

scc1 %

quota -s

Check your disc usage in the home directory:

Check the disc usage by your projectSlide40

SCC: SU usage

Use

acctool

to get the information about SU (service units) usage

:

scc1 % acctool

-host shared -b 1/01/15 y

scc1 %

acctool

y

My project(s) total usage on all hosts yesterday (short form):

My project(s) total usage on shared nodes for the past moth

scc1 %

acctool

-p

scv

-balance -b 1/01/15 y

My balance for the project

scv

scc1 %

acctool

-b y

My balance for all the projects I belong toSlide41

My job is to slow… How I can speed it up?Slide42

SCC: optimization

Before you look into parallelization of your code, optimize it

! Parallelized inefficient

code is still

inefficient

There are a number of well know techniques in every language. There are also some specifics in running the code on the cluster!There are a few different versions of compilers on the SCC:GCC

(4.8.1, 4.9.2, 5.1.0, 5.3.0)

PGI

(13.5, 16.5)

Intel

(2015, 2016)Slide43

SCC: optimization - IO

Reduce the number of I/O to the home directory/project space (if possible);

Group smaller I/O statements into larger where possible

Utilize local /scratch space

Optimize the seek pattern to reduce the amount of time waiting for disk seeks.

If possible read and write numerical data in a binary formatSlide44

SCC: optimization

Many languages allow operations on vectors/matrices;

Pre-allocate arrays before accessing them within loops;

Reuse variables when possible and delete those that are not needed anymore;

Access elements within your code according to the storage pattern in this language (FORTRAN, MATLAB, R – in columns; C, C++ - rows)

email SCC (help@scc.bu.edu)

The members of our group will be happy to assist you with the tips how to improve the performance of your code for the specific language/application.Slide45

SCC: Code development and debugging

Integrated development Environment (IDE)

codeblocks

geany

eclipse

Debuggers:gdbddd

TotalView

OpenSpeedShopSlide46

SCC: parallelization

Running multiple jobs (tasks) simultaneously

openMP/multithreaded jobs ( use some or all the cores on one node)

MPI (uses multiple cores possibly across a number of nodes)

GPU parallelization

SCC tutorials There are a number of tutorials that cover various parallelization techniques in R, MATLAB, C and FORTRAN.Slide47

SCC: parallelization

Copy Simple Examples

The examples could be found on-line:

http://www.bu.edu/tech/support/research/system-usage/running-jobs/advanced-batch

/

http://scv.bu.edu/examples/SCC/ Copy examples to the current directory:

scc1 %

cp

/project/

scv

/examples/SCC/depend .

scc1 %

cp

/

project/

scv

/examples/SCC/many

.

scc1 %

cp

/project/scv

/examples/SCC/par .Slide48

SCC: Array jobs

An array job executes independent copy of the same job script. The number of tasks to be executed is set using

-t option to the qsub

 command, .

i.e

:scc1 % qsub

-t 1-10

<

my_script

>

The above command will submit an array job consisting of 10 tasks, numbered from 1 to 10. The batch system sets up

SGE_TASK_ID

 environment variable which can be used inside the script to pass the task ID to the program:

#!/bin/bash -l

Rscript

my_R_program.R

$SGE_TASK_IDSlide49

SCC: Job dependency

Some jobs may be required to run in a specific order. For this

application, the job dependency can be controlled using "-hold_jid" option:

scc1 %

qsub

-N job1 script1scc1 %

qsub

-N job2 -

hold_jid

job1 script2

scc1 %

qsub

-N job3 -

hold_jid

job2 script3

A job might need to wait until the remaining jobs in the group have completed (aka post-processing).

In this example,

lastjob

won’t start until job1, job2, and job3 have completed.

scc1%

qsub

-N job1 script1

scc1%

qsub

-N job2 script2

scc1%

qsub

-N job3 script3

scc

%

qsub

-N

lastJob

-

hold_jid

"job*" script4Slide50

SCC: Links

Research Computing website:

http://www.bu.edu/tech/support/research/

RCS software:

http://sccsvc.bu.edu/software/

RCS examples: http://rcs.bu.edu/examples/

RCS

Tutorial Evaluation:

http://

scv.bu.edu/survey/tutorial_evaluation.html

Please contact us at

help@scc.bu.edu

if you have any problem or questionSlide51

SCC: Apendix

qstat

qstat

-u

user-id

All

current jobs submitted by the user user-id

qstat

-s r

List

of running jobs

qstat

-s p

List

of pending jobs (

hw

,

hqw

, Eqw...)qstat -u user-id -r Display the resources requested by the job

qstat

-u user-id -s r -t

Display

info about sub-tasks of parallel jobs

qstat

-explain c -j job-id

Display

job status

qstat

-g c

Display the

list of queues and load information

qstat

-q queue

Display

jobs running on a particular queueSlide52

SCC: Apendix

qselect

qselect

-

pe

omp

16

list

all nodes that can execute

16-processor job

qselect

-l

mem_total

=252G

list

all large memory nodes

qselect

-pe mpi16 list all the nodes that can run 16-slot mpi jobsqselect

-l

gpus

=1

list

all the nodes with GPUsSlide53

SCC: Apendix

qdel

qdel

-j job-id

Delete job

job-id

qdel

-u user-id

Delete all the jobs submitted by the userSlide54

SCC: Apendix

qhost

qhost

-q

Display queues hosted by host

qhost

-j

Display all the jobs hosted by host

qhost

-F

Display

info

about each node