Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011,

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, - Start

Added : 2017-08-01 Views :46K

Download Presentation

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011,




Download Presentation - The PPT/PDF document "Prepared 7/28/2011 by T. O’Neil for 34..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011,

Slide1

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.

Parallel

Architectures & Performance Analysis

Slide2

Parallel computer: multiple-processor system supporting parallel programming.Three principle types of architectureVector computers, in particular processor arraysShared memory multiprocessorsSpecially designed and manufactured systemsDistributed memory multicomputersMessage passing systems readily formed from a cluster of workstations

Parallel Architectures and Performance Analysis – Slide 2

Parallel Computers

Slide3

Vector computer: instruction set includes operations on vectors as well as scalarsTwo ways to implement vector computersPipelined vector processor (e.g. Cray): streams data through pipelined arithmetic unitsProcessor array: many identical, synchronized arithmetic processing elements

Parallel Architectures and Performance Analysis – Slide 3

Type 1: Vector Computers

Slide4

Natural way to extend single processor modelHave multiple processors connected to multiple memory modules such that each processor can access any memory moduleSo-called shared memory configuration:

Parallel Architectures and Performance Analysis – Slide 4

Type 2: Shared Memory Multiprocessor Systems

Slide5

Parallel Architectures and Performance Analysis – Slide 5

Ex: Quad Pentium Shared Memory Multiprocessor

Slide6

Type 2: Distributed MultiprocessorDistribute primary memory among processorsIncrease aggregate memory bandwidth and lower average memory access timeAllow greater number of processorsAlso called non-uniform memory access (NUMA) multiprocessor

Parallel Architectures and Performance Analysis – Slide 6

Fundamental Types of Shared Memory Multiprocessor

Slide7

Parallel Architectures and Performance Analysis – Slide 7

Distributed Multiprocessor

Slide8

Complete computers connected through an interconnection network

Parallel Architectures and Performance Analysis – Slide 8

Type 3: Message-Passing Multicomputers

Slide9

Distributed memory multiple-CPU computerSame address on different processors refers to different physical memory locationsProcessors interact through message passingCommercial multicomputersCommodity clusters

Parallel Architectures and Performance Analysis – Slide 9

Multicomputers

Slide10

Parallel Architectures and Performance Analysis – Slide 10

Asymmetrical Multicomputer

Slide11

Parallel Architectures and Performance Analysis – Slide 11

Symmetrical Multicomputer

Slide12

Parallel Architectures and Performance Analysis – Slide 12

ParPar Cluster: A Mixed Model

Slide13

Michael Flynn (1966) created a classification for computer architectures based upon a variety of characteristics, specifically instruction streams and data streams.Also important are number of processors, number of programs which can be executed, and the memory structure.

Parallel Architectures and Performance Analysis – Slide 13

Alternate System: Flynn’s Taxonomy

Slide14

Parallel Architectures and Performance Analysis – Slide 14

Flynn’s Taxonomy: SISD (cont.)

Control unit

Arithmetic

Processor

Memory

Control Signals

Instruction

Data Stream

Results

Slide15

Parallel Architectures and Performance Analysis – Slide 15

Flynn’s Taxonomy: SIMD (cont.)

Control Unit

Control Signal

PE 1

PE 2

PE

n

Data Stream 1

Data Stream 2

Data Stream

n

Slide16

Parallel Architectures and Performance Analysis – Slide 16

Flynn’s Taxonomy: MISD (cont.)

Control

Unit 1

Control

Unit 2

Control

Unit n

Processing

Element 1

Processing

Element 2

Processing

Element n

Instruction Stream 1

Instruction Stream 2

Instruction Stream

n

Data

Stream

Slide17

Parallel Architectures and Performance Analysis – Slide 17

MISD Architectures (cont.)

Serial execution of two processes with 4 stages each. Time to execute T = 8 t , where t is the time to execute one stage.

Pipelined execution of the same two processes. T = 5 t

S1S2S3S4S1S2S3S4

S1

S2

S3

S4

S1

S2

S3

S4

Slide18

Parallel Architectures and Performance Analysis – Slide 18

Flynn’s Taxonomy: MIMD (cont.)

Control

Unit 1

Control

Unit 2

Control

Unit n

Processing

Element 1

Processing

Element 2

Processing

Element n

Instruction Stream 1

Instruction Stream 2

Instruction Stream

n

Data Stream 1

Data Stream 2

Data Stream

n

Slide19

Multiple Program Multiple Data (MPMD) StructureWithin the MIMD classification, which we are concerned with, each processor will have its own program to execute.

Parallel Architectures and Performance Analysis – Slide 19

Two MIMD Structures: MPMD

Slide20

Single Program Multiple Data (SPMD) StructureSingle source program is written and each processor will execute its personal copy of this program, although independently and not in synchronism.The source program can be constructed so that parts of the program are executed by certain computers and not others depending upon the identity of the computer.Software equivalent of SIMD; can perform SIMD calculations on MIMD hardware.

Parallel Architectures and Performance Analysis – Slide 20

Two MIMD Structures: SPMD

Slide21

ArchitecturesVector computersShared memory multiprocessors: tightly coupledCentralized/symmetrical multiprocessor (SMP): UMADistributed multiprocessor: NUMADistributed memory/message-passing multicomputers: loosely coupledAsymmetrical vs. symmetricalFlynn’s TaxonomySISD, SIMD, MISD, MIMD (MPMD, SPMD)

Parallel Architectures and Performance Analysis – Slide 21

Topic 1 Summary

Slide22

A sequential algorithm can be evaluated in terms of its execution time, which can be expressed as a function of the size of its input.The execution time of a parallel algorithm depends not only on the input size of the problem but also on the architecture of a parallel computer and the number of available processing elements.

Parallel Architectures and Performance Analysis – Slide 22

Topic 2: Performance Measures and Analysis

Slide23

The speedup factor is a measure that captures the relative benefit of solving a computational problem in parallel.The speedup factor of a parallel computation utilizing p processors is defined as the following ratio:In other words, S(p) is defined as the ratio of the sequential processing time to the parallel processing time.

Parallel Architectures and Performance Analysis – Slide 23

Speedup Factor

Slide24

Speedup factor can also be cast in terms of computational steps:Maximum speedup is (usually) p with p processors (linear speedup).

Parallel Architectures and Performance Analysis – Slide 24

Speedup Factor (cont.)

Slide25

Given a problem of size n on p processors letInherently sequential computations (n)Potentially parallel computations (n)Communication operations (n,p)Then:

Parallel Architectures and Performance Analysis – Slide 25

Execution Time Components

Slide26

Parallel Architectures and Performance Analysis – Slide 26

Speedup Plot

“elbowing out”

Number of processors

Slide27

The efficiency of a parallel computation is defined as a ratio between the speedup factor and the number of processing elements in a parallel system:Efficiency is a measure of the fraction of time for which a processing element is usefully employed in a computation.

Parallel Architectures and Performance Analysis – Slide 27

Efficiency

Slide28

Since E = S(p)/p, by what we did earlierSince all terms are positive, E > 0Furthermore, since the denominator is larger than the numerator, E < 1

Parallel Architectures and Performance Analysis – Slide 28

Analysis of Efficiency

Slide29

Parallel Architectures and Performance Analysis – Slide 29

Maximum Speedup: Amdahl’s Law

Slide30

As before since the communication time must be non-trivial.Let f represent the inherently sequential portion of the computation; then

Parallel Architectures and Performance Analysis – Slide 30

Amdahl’s Law (cont.)

Slide31

LimitationsIgnores communication timeOverestimates speedup achievableAmdahl EffectTypically (n,p) has lower complexity than (n)/pSo as p increases, (n)/p dominates (n,p)Thus as p increases, speedup increases

Parallel Architectures and Performance Analysis – Slide 31

Amdahl’s Law (cont.)

Slide32

As before Let s represent the fraction of time spent in parallel computation performing inherently sequential operations; then

Parallel Architectures and Performance Analysis – Slide 32

Gustafson-Barsis’ Law

Slide33

Then

Parallel Architectures and Performance Analysis – Slide 33

Gustafson-Barsis’ Law (cont.)

Slide34

Begin with parallel execution time instead of sequential timeEstimate sequential execution time to solve same problemProblem size is an increasing function of pPredicts scaled speedup

Parallel Architectures and Performance Analysis – Slide 34

Gustafson-Barsis’ Law (cont.)

Slide35

Both Amdahl’s Law and Gustafson-Barsis’ Law ignore communication timeBoth overestimate speedup or scaled speedup achievable Gene Amdahl John L. Gustafson

Parallel Architectures and Performance Analysis – Slide 35

Limitations

Slide36

Performance terms: speedup, efficiencyModel of speedup: serial, parallel and communication componentsWhat prevents linear speedup?Serial and communication operationsProcess start-upImbalanced workloadsArchitectural limitationsAnalyzing parallel performanceAmdahl’s LawGustafson-Barsis’ Law

Parallel Architectures and Performance Analysis – Slide 36

Topic 2 Summary

Slide37

Based on original material fromThe University of Akron: Tim O’Neil, Kathy LiszkaHiram College: Irena LomonosovThe University of North Carolina at CharlotteBarry Wilkinson, Michael AllenOregon State University: Michael QuinnRevision history: last updated 7/28/2011.

Parallel Architectures and Performance Analysis – Slide 37

End Credits


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.
Youtube