Going Beyond Serial MATLAB Applications. MATLAB . Desktop (Client). Worker. Worker. Worker. Worker. Worker. Worker. Programming Parallel Applications (CPU). Built-in support. with t. oolboxes. Ease of Use.

**50**

Parallel computing with MATLAB

Slide2Going Beyond Serial MATLAB Applications

MATLAB

Desktop (Client)

Worker

Worker

Worker

Worker

Worker

Worker

Slide3Programming Parallel Applications (CPU)

Built-in support with toolboxes

Ease of Use

Greater Control

Slide4Example: Optimizing Cell Tower PositionBuilt-in parallel support

With Parallel Computing Toolbox use built-in parallel algorithms in Optimization Toolbox Run optimization in parallelUse pool of MATLAB workers

Slide5Tools Providing Parallel Computing Support

Optimization Toolbox Global Optimization ToolboxStatistics ToolboxSignal Processing ToolboxNeural Network ToolboxImage Processing Toolbox…

BLOCKSETS

Directly leverage functions in Parallel Computing Toolbox

www.mathworks.com/builtin-parallel-support

Slide6Agenda

Task parallel applicationsGPU accelerationData parallel applicationsUsing clusters and grids

Slide7Ideal problem for parallel computingNo dependencies or communications between tasksExamples: parameter sweeps, Monte Carlo simulations

Independent Tasks or Iterations

Time

Time

Slide8Example: Parameter Sweep of ODEsParallel for-loops

Parameter sweep of ODE systemDamped spring oscillatorSweep through different values of damping and stiffnessRecord peak value for eachsimulationConvert for to parforUse pool of MATLAB workers

Slide9The Mechanics

of

parfor

Loops

Pool of MATLAB Workers

a = zeros(10, 1)

parfor

i = 1:10

a(i) = i;

end

a

a(i) = i;

a(i) = i;

a(i) = i;

a(i) = i;

Worker

Worker

Worker

Worker

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

Slide10Agenda

Task parallel applicationsGPU accelerationData parallel applicationsUsing clusters and grids

Slide11What is a Graphics Processing Unit (GPU)

Originally for graphics acceleration, now also used for scientific calculations

Massively parallel array of integer andfloating point processorsTypically hundreds of processors per cardGPU cores complement CPU coresDedicated high-speed memory

* Parallel Computing Toolbox requires NVIDIA GPUs with Compute Capability 1.3 or higher, including NVIDIA Tesla 20-series products. See a complete listing at www.nvidia.com/object/cuda_gpus.html

Slide12Core 1

Core 3

Core 4

Core 2

Cache

Performance Gain with More Hardware

Using More Cores (CPUs)

Using GPUs

Device Memory

GPU cores

Device Memory

Slide13Example: Mandelbrot set

The color of each pixel is the result of hundreds or thousands or iterationsEach pixel is independent of the other pixelsHundres of thousands of pixels

Slide14Real-world performance increaseSolving a wave equation

Intel Xeon Processor X5650, NVIDIA Tesla C2050 GPU

Grid SizeCPU (s)GPU(s)Speedup64 x 640.10040.35530.28128 x 1280.19310.33680.57256 x 2560.58880.42171.4512 x 5122.81630.82433.41024 x 102413.47972.49795.42048 x 204874.99049.95677.5

Slide15Programming Parallel Applications (GPU)

Built-in support with toolboxesSimple programming constructs:gpuArray, gather Advanced programming constructs:arrayfun, spmdInterface for experts: CUDAKernel, MEX support

Ease of Use

Greater Control

Slide16Agenda

Task parallel applicationsGPU accelerationData parallel applicationsUsing clusters and grids

Slide17Big data: Distributed Arrays

T

OOLBOXES

B

LOCKSETS

Distributed Array

Lives on the Cluster

Remotely Manipulate Array

from Desktop

11

26

41

12

27

42

13

28

43

14

29

44

15

30

45

16

31

46

17

32

47

17

33

48

19

34

49

20

35

50

21

36

51

22

37

52

Slide18Big Data: Distributed Arrays

Pool of MATLAB Workers

y = distributed(rand(10));

Column 1:3 of y

Worker

Worker

Worker

Worker

Column 7:8 of y

Column 9:10 of y

Column 4:6 of y

Slide19Demo: Approximation of π

Slide20

Programming Parallel Applications (CPU)

Built-in support with toolboxesSimple programming constructs:parfor, batch, distributedAdvanced programming constructs:createJob, labSend, spmd

Ease of Use

Greater Control

Slide21Agenda

Task parallel applicationsGPU accelerationData parallel applicationsUsing clusters and grids

Slide22Working on C3SE

Slide23Apply for a project with SNIC

Slide24Slide25

Slide26

Slide27

Slide28

