/
Dr. Wei Chen ( 陈慰 ), Professor Dr. Wei Chen ( 陈慰 ), Professor

Dr. Wei Chen ( 陈慰 ), Professor - PowerPoint Presentation

atomexxon
atomexxon . @atomexxon
Follow
351 views
Uploaded On 2020-08-05

Dr. Wei Chen ( 陈慰 ), Professor - PPT Presentation

Tennessee State University 2017 年 6 月 at 法政大学 1 Lectures on Parallel and Distributed Computing 2 Lecture 1 Introduction to parallel computing Lecture 2 Parallel computational models ID: 798898

processors memory data processor memory processors processor data parallel stream instruction distributed control connection computer number type shared computing

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Dr. Wei Chen ( 陈慰 ), Professor" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Dr. Wei Chen (陈慰), ProfessorTennessee State University2017年6月at法政大学

1

Lectures on Parallel and Distributed Computing

Slide2

2Lecture 1: Introduction to parallel computing Lecture 2: Parallel computational modelsLecture 3: Parallel algorithm design and analysis

Lecture

4:

Distributed-memory programming with

PVM/MPILecture 5: Shared-memory programming with Open MP Lecture 6: Shared-memory programming with GPULecture 7: Introduction to distributed systemsLecture 8: Synchronous network algorithmsLecture 9: Asynchronous shared-memory/network algorithmsLecture 10: Application ILecture 11: Application II

Outline

Slide3

Reference: (1) Lecture 1 & 2 & 3: Joseph Jaja, “Introduction to Parallel Algorithms,” Addison Wesley, 1992. (2) Lecture 4 & 5 & 6: Peter S. Pacheco: An Introduction to Parallel Programming, Morgan Kaufmann Publishers, 2011. (3) Lecture 7 & 8: Nancy A. Lynch, “Distributed Algorithms,” Morgan Kaufmann Publishers, 1996. 3

Slide4

4Lecture1: Introduction to Parallel Computing

Slide5

5 Problems with large computing complexity Computing hard problems (NP-complete problems) exponential computing time.

Problems

with large scale of input

size

quantum chemistry, statistic mechanics, relative physics, universal physics, fluid mechanics, biology, genetics engineering, … For example, it costs 1 hour using the current computer to simulate the procedure of 1 second reaction of protein molecule and water molecule. It costs Why Parallel Computing

Slide6

6Why parallel ComputingPhysical limitation of CPU computational power In past 50 years, CPU was speeded up double every 2.5 years. But, there are a physical limitation. Light speed is Therefore, the limitation of the number of CPU clocks is expected to be about 10GHz. To solve computing hard problems  

Parallel processing 

DNA computer

 

Quantum computer …

Slide7

7What is parallel computingUsing a number of processors to process one taskSpeeding up the processing by distributing it to

the processors

One problem

Slide8

8Classification of parallel computersTwo kinds of classification1.Flann’s Classification SISD (Single Instruction stream, Single Data stream)

 

MISD (Multiple Instruction stream, Single Data stream)

 

SIMD (Single Instruction stream, Multiple Data stream) MIMD (Multiple Instruction stream, Multiple Data stream) 2. Classification by memory status Share memory Distributed memory 

Slide9

9Flynn’s classification(1)SISD (Single Instruction Single Data) computer Von Neuman’s one processor computer

Control

Processor

Memory

Instruction

Stream

Data

Stream

Slide10

10Flynn’s classification (2)MISD (Multi Instructions Single Data) computer All processors share a common memory, have their own control devices and execute their own instructions on same data.

Memory

Control

Processor

Instruction

Stream

Data

Stream

Control

Processor

Instruction

Stream

Control

Processor

Instruction

Stream

Slide11

11Flynn’s classification (3)SIMD (Single Instructions Multi Data) computer Processors execute the same instructions on different data Operations of processors are synchronized by global clock.

Shared

Memory

or Inter-

connection

Nerwork

Processor

Data

Stream

Control

Processor

Instruction

Stream

Processor

Data

Stream

Data

Stream

Slide12

12Flynn’s classification (4)MIMD (Multi Instruction Multi Data) Computer Processors have their own control devices, and execute different instructions on different data.Operations of processors are executed asynchronously in most time. It is also called as distributed computing system.

Shared

Memory

or Inter-

connection

Nerwork

Processor

Data

Stream

Processor

Control

Instruction

Stream

Processor

Data

Stream

Data

Stream

Control

Instruction

Stream

Control

Instruction

Stream

Slide13

13Classification by memory types (1)1. Parallel computer with a shared common memory  Communication based on shared memoryFor example, consider the case that processor i

sends some data to processor

j.

First, processor

i writes the data to the share memory, then processor j reads the data from the same address of the share memory. Shared Memory

Processor

Processor

Processor

Slide14

14Classification by memory types (2)Features of parallel computers with shared common memory Programming is easy. Exclusive control is necessary for the access to the same memory cell.  

Realization is difficult when the number of processors is large.

Reason) The number of processors connected to shared memory is limited by the physical factors such as the size and voltage of units, and the latency caused in memory accessing.

Slide15

15Classification by memory tapes (3)2. Parallel computers with distributed memoryCommunication style is one to one based on interconnection network. For example, consider the case that processor i sends data to processor

j

.

First, processor

i issues a send command such as “processor j sends xxx to process i”, then processor j gets the data by a receiving command.

Slide16

16Classification by memory types (4)Features of parallel computers using distributed memory There are various architectures of interconnection networks

(Generally, the degree of connectivity is not large.)

Programming is difficult since comparing with shared common memory the communication style is one to one.

It is easy to increase the number of processors.

Slide17

17Types of parallel computers with distributed memory Complete connection type Any two processors are connected Features

Strong communication ability, but not practical

(each processor has to be connected to many processors

).

Mash connection type Processors are connected as a two-dimension lattice. Features - Connected to few processors. Easily to increase the number of processors.Large distance between processors: Existence of processor communication bottleneck.

Slide18

18 Hypercube connection typeProcessors connected as a hypercube (each processor has a binary number. (Processors are connected if only if one bit of their number are different.)Features Small distance between processors: log n.

Balanced communication load because of its symmetric structure.

Easy to increase the number of processors.

Types of parallel computers with distributed memory

Slide19

19Other connected type Tree connection type, butterfly connection type, bus connection type. Criterion for selecting an inter-connection network  Small diameter (the largest distance between processors) for small communication delay.  Symmetric structure for easily increasing the number of

processors.

The type of inter-connection network depends on

application, ability of processors, upper bound of

the number of processors and other factors. Types of parallel computers with distributed memory

Slide20

20Real parallel processing system (1)Early days parallel computer (ILLIAC IV) Built in 1972 SIMD type with distributed memory, consisting of 64 processors

Transformed mash connection type,

equipped with common data bus,

common control bus, and one control

Unit.

Slide21

21Real parallel processing system (2)Parallel computers in recent 1990s Shared common memory typeWorkstation, SGI Origin2000 and other with 2-8 processors.  

Distributed memory type

Name

Maker

Processor num

Processing type

Network type

CM-2

TM

65536

SIMD

hypercube

CM-5

TM

1024

MIMD

fat tree

nCUBE2

NCUBE

8192

MIMD

hypercube

iWarp

CMU, Intel

64

MIMD

2D torus

Paragon

Intel

4096

MIMD

2D torus

SP-2

IBM

512

MIMD

HP switch

AP1000

Fujitsu

1024

MIMD

2D torus

SR2201

Hitachi

1024

MIMD

crossbar

Slide22

22Real parallel processing system (3)Deep blue Developed by IBM for chess game only  Defeating the chess champion  Based on general parallel computer SP-2

memory

RS/6000

Bus

Interface

Microchannel Bus

Deep

blue

chip

Deep

blue

chip

Deep

blue

chip

8

RS/6000

node

memory

RS/6000

node

RS/6000

node

(32 nodes)

Inter-connection network (Generalized hypercube)

memory

memory

VLSI Chess Processor

Slide23

23K (京)Computer - Fujitsu

Architecture:

88,128 SPARC64

VIIIfx

2.0 GHz 8 cores processors, 864 cabinets of each with 96 computing nodes and 6 I/O nodes, 6 dimension Tofu/torus interconnect, Linux-based enhanced operating system, open-source Open MPI libaray,12.6 MW, 10.51 petaflops, ranked as # 3 in 2011.

Slide24

24Architecture:

18,688 AMD Opteron 6247 16-coresCPUs, Cray Linux, 8.2 MW, 17.59

petaflops

, GPU based, Torus topology, ranked #1 in 2012.

Titan –Oak Ridge Lab

Slide25

天河一号 – 国家计算中心,天津Architecture:

14,336 Xeon X5670 processors, 7168

Nvidia

Tesla M2050 GPUs, 2048

FeiTeng 1000 SPARC-based processors, 4.7 petaflops. 112 computer cabinets, 8 I/O cabinets, 11D hypercube topology with IB QDR/DDR,Linux, #2 in 201125

Slide26

Exercises1. Give more details for how Deep Blue works (1 pages)2. Compare top three supercomputers in the world in terms of the number of processors, architectures, speed and any other qualifications that you can think (1-2 pages). 26