Introduction to parallel programming

Introduction to parallel programming Introduction to parallel programming - Start

Added : 2018-11-30 Views :4K

Download Presentation

Introduction to parallel programming




Download Presentation - The PPT/PDF document "Introduction to parallel programming" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Introduction to parallel programming

Slide1

Introduction to parallel programming modelS

CS 5802

Monica Borra

Slide2

Overview

Types of parallel programming models

Shared memory Model

OpenMP

POSIX Threads

Cilk

/

Cilk

Plus/

Cilk

Plus

Plus

Thread Building Blocks

Slide3

Types of Parallel Programming Models:

Shared Memory model,

Threads Model,

Distributed Memory model and

Hybrid Models

Parallel Programming Model

A set of software technologies to express parallel algorithms and match applications with the underlying parallel systems.

“an abstraction above hardware and memory architectures”

Slide4

Programming models NOT specific to a particular type of machine or memory architecture.

“Virtual Shared Memory”

Machine memory is physical distributed across networked machines, but appeared to the user as a single shared memory global address space. 

Every task has direct access to global address space yet the ability to send and receive messages using MPI can be implemented.

Slide5

Shared Memory

Common block of read/write memory among processes

Proc. 1

Proc. 2

ptr

Attach

Proc. 3

Proc. 4

Proc. 5

ptr

ptr

ptr

ptr

Attach

Create

Shared Memory

(unique key)

0

MAX

Shared memory segment is created by the first process.

Other processes know the key and have access to the shared memory segment.

So, they can attach and share with one another.

int

shmget

(

key_t

key

,

size_t

size

,

int

shmflg

);

Slide6

Program is a collection of threads of control.Can be created dynamically, mid-execution, in some languages

Each thread has a set of private variables, e.g., local stack variables

Also a set of shared variables, e.g., static variables, shared common blocks, or global heap.

Threads communicate implicitly by writing and reading shared variables.

Data Racing Problem. - Require synchronization to ensure that no more than one thread is updating the same global address at any time.

Thread Models

Slide7

Several Thread Libraries/systems

PTHREADS

is the POSIX Standard

OpenMP

standard for application level programming

TBB: Thread Building BlocksCILK: Language of the C “ilk”Java threads

Slide8

Distributed memory model

A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine and/or across an arbitrary number of machines.

Tasks exchange data through communications by sending and receiving messages.

Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.

Slide9

Open Multi Processing

A simple API that allows to add parallelism into existing source code without significantly having to rewrite it.

Programming in C/C++/Fortran.

It is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer

It is composed of a set of compiler directives, library routines, and environment variables

Easier to understand and maintain.

Slide10

Fork-Join Model.

Slide11

Since it is compiler directive based, it requires a compiler that supports.

The directives can be added incrementally – gradual parallelization.

OpenMP

(Note that launching more thread than number of processing unit available can actually slow down the whole program ) 

Slide12

OpenMP

Example:

include<

iostream

>

#include<

omp.h

>

using namespace

std

;

/********************************************************************

Sample

OpenMP

program which at stage 1 has 4 threads and at stage 2 has 2 threads

**********************************************************/

int

main()

{

#pragma

omp

parallel 

num_threads

(4) /

/*create 4 threads and region inside it will be executed by all   threads . */

{  #pragma omp critical//allow one thread at a time to access below statement

  cout<<" Thread Id  in OpenMP stage 1=  "<<omp_get_thread_num()<< endl

;}  //here all thread get merged into one thread id

cout<<"I am alone"<<

endl;

#pragma

omp

parallel

num_threads

(2)//create two threads

{

  

cout

<<" Thread Id  in

OpenMP stage 2=  "<<omp_get_thread_num()<<  endl

;;

}

}

 Command to run executable  with name a.out  on Linux :

  /a.out Output      

       

Thread Id  in

OpenMP

stage 1= 2

      

        

Thread Id  in

OpenMP

stage 1=0

        

Thread Id  in

OpenMP

stage 1=3

        

Thread Id  in

OpenMP

stage 1= 1

        

I am alone

        

Thread Id  in

OpenMP

stage 2= 1

        

Thread Id  in

OpenMP

stage 2=0

Slide13

OpenMP

Advantages

Programmer need not specify the processors ( nodes)

No need for message passing since it uses a shared memory

Its style of coding fits for both serial and parallel paradigms

Ability to deal with coarse-grain parallelism with shared memoryDisadvantages

Runs efficiently only on shared memory platforms.

Scalability is hindered due to shared memory architecture No reliable error handling mechanisms.

Synchronization between subset threads isn’t allowed.

Slide14

POSIX THREADS

POSIX: Portable Operating System Interface for UNIX - Interface to Operating System utilities

PThreads

: The POSIX threading interface

Implementations of the API are available in C/C++ on many 

Unix-like

OS. However, we need third-party packages such as pthreads-w32, which implements

pThreads

on top of existing Windows API.

Pthreads

defines a set of programming language types, functions and constants. It is implemented with a

pthread.h

header and a thread library.

There are around 100 Pthreads procedures, all prefixed "pthread_" and they can be categorized into four groups: Thread Management, Mutexes, Condition Variables, Synchronization.

Slide15

Forking a POSIX Thread:int

pthread_create

(

pthread_t *, const

pthread_attr_t *, void * (*)(void *), void *);Example call:errcode = pthread_create(&thread_id; &thread_attribute

&

thread_fun

; &

fun_arg

);

thread_id

is the thread id or handle (used to halt, etc.)

thread_attribute various attributes a. Standard default values obtained by passing a NULL pointer b. Sample attribute: minimum stack sizethread_fun the function to be run (takes and returns void*)fun_arg an argument can be passed to thread_fun when it starts errorcode will be set nonzero if the create operation fails

Slide16

Some other functions:

pthread_yield

();

Informs the scheduler that the thread is willing to yield its quantum, requires no arguments.

pthread_exit(void *value); Exit thread and pass value to joining thread (if exists)

pthread_join(pthread_t *thread, void **result); Wait for specified thread to finish. Place exit value into *result.pthread_t me; me = pthread_self

();

Allows a

pthread

to obtain its own identifier

pthread_t thread;

pthread_detach

(thread);

Informs the library that the threads exit status will not be needed by subsequent pthread_join calls resulting in better threads performance.

Slide17

Simple Example:

void*

SayHello

(void *foo) {

printf

( "Hello, world!\n" );return NULL;}int main() {pthread_t

threads[16];

int

tn

;

for(

tn

=0; tn<16; tn++) {pthread_create(&threads[tn], NULL, SayHello, NULL);}for(tn=0; tn<16 ; tn++) {pthread_join(threads[tn], NULL);}return 0;}Compile using gcc –lpthread

Slide18

CILK/CILK PLUS/CILK++

Programming Languages which extend C and C++.

Initially developed by MIT, based on ANSI C now belongs to Intel.

Initial applications of

Cilk

were only in high performance computing.

Intel

Cilk

Plus keywords:

cilk_spawn

 - Specifies that a function call can execute asynchronously, without requiring the caller to wait for it to return. This is an expression of an opportunity for parallelism, 

not

 a command that mandates parallelism. The Intel

Cilk Plus runtime will choose whether to run the function in parallel with its caller.cilk_sync - Specifies that all spawned calls in a function must complete before execution continues. There is an implied cilk_sync at the end of every function that contains a cilk_spawn.cilk_for - Allows iterations of the loop body to be executed in parallel.Also introduces  "Reducers” provide a lock-free mechanism that allows parallel code to use private "views" of a variable which are merged at the next sync.

Slide19

Example of Cilk Plus

int

fib(

int

n)

{ if (n < 2) return n; int x = cilk_spawn fib(n-1);

int

y = fib(n-2);

cilk_sync

;

return x + y;

}

Uses the header file <cilk/cilk.h>cilk_for (int i = 0; i < 8; ++i){ do_work(i);}for (int i = 0; i < 8; ++i){ cilk_spawn do_work(i);}cilk_sync;

Slide20

Thread Building Blocks(TBB)

 A C++ template library developed by Intel for parallel programming on multi-core processors.

TBB enables you to specify tasks instead of Threads

TBB is compatible with other threading packages

A TBB program creates, synchronizes and destroys graphs of dependent tasks according to 

algorithms

, i.e. high-level parallel programming paradigms ( Algorithmic Skeletons)

TBB emphasize scalable, data parallel programming

Optimizes core utilization. May result in scheduling overhead.

Slide21

Basic algorithms: parallel_for

,

parallel_reduce

,

parallel_scanAdvanced algorithms: parallel_while

, parallel_do, parallel_pipeline, parallel_sortContainers: concurrent_queue, concurrent_priority_queue,

concurrent_vector

,

concurrent_hash_map

Memory allocation:

scalable_malloc

,

scalable_free, scalable_realloc, scalable_calloc, scalable_allocator, cache_aligned_allocatorMutual exclusion: mutex, spin_mutex, queuing_mutex, spin_rw_mutex, queuing_rw_mutex, recursive_mutexAtomic operations: fetch_and_add, fetch_and_increment, fetch_and_decrement, compare_and_swap, fetch_and_storeTBB relies on generic programming. It is similar to Standard Tag Library.Detailed explanation of TBB ComponentsTBB COMPONENTS

Slide22

Slide23

High Performance Fortran

Extension of Fortran 90 with constructs that support parallel computing

Allows efficient implementation on both SIMD and MIMD style architectures

Implicit parallelizing (mapping, distribution, communication, synchronization)

High productivity

Slide24

Parallel Virtual Machine (pvm)

Enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational resource.

Supports software execution on each machine in a user-configurable pool

Heterogeneous applications that can exploit specific strengths of individual machines on a network.

Set of dynamic resource manager and powerful process control functions

Fault tolerant (that can survive host or task failures) and portable.

Slide25

THANK YOU!

Slide26


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.
Youtube