/
The  Manycore  Shift: Making Parallel Computing Mainstream The  Manycore  Shift: Making Parallel Computing Mainstream

The Manycore Shift: Making Parallel Computing Mainstream - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
346 views
Uploaded On 2019-06-23

The Manycore Shift: Making Parallel Computing Mainstream - PPT Presentation

Bart JF De Smet bartdemicrosoftcom http blogsbartdesmetnetbart Software Development Engineer Microsoft Corporation Session Code DTL206 Wishful thinking Agenda The concurrency landscape ID: 760156

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Manycore Shift: Making Parallel Co..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Slide2

The Manycore Shift: Making Parallel Computing Mainstream

Bart J.F. De Smetbartde@microsoft.comhttp://blogs.bartdesmet.net/bartSoftware Development EngineerMicrosoft CorporationSession Code: DTL206

Wishful thinking?

Slide3

Agenda

The concurrency landscape

Language headaches

.NET 4.0 facilities

Task Parallel Library

PLINQ

Coordination Data Structures

Asynchronous programming

Incubation projects

Summary

Slide4

Moore’s law

The number of transistors incorporated in a chip willapproximately double every 24 months. Gordon Moore – Intel – 1965

Let’s sell processors

Slide5

Moore’s law today

It can't continue forever.The nature of exponentials is that you push them out and eventually disaster happens. Gordon Moore – Intel – 2005

Let’s sell

even moreprocessors

Slide6

Hardware Paradigm Shift

“… we see a very significant shift in what architectures will look like in the future ...

fundamentally the way we've begun to look at doing that is to move from instruction level concurrency to … multiple cores per die. But we're going to continue to go beyond there. And that just won't be in our server lines in the future; this will permeate every architecture that we build. All will have massively multicore implementations.”Intel Developer Forum, Spring 2004Pat GelsingerChief Technology Officer, Senior Vice PresidentIntel CorporationFebruary, 19, 2004

10,000

1,000

100

101

‘70 ‘80 ‘90 ‘00 ‘10

Power Density (W/cm2)

4004

8008

8080

8085

8086

286

386

486

Pentium

®

processors

Hot Plate

Nuclear Reactor

Rocket Nozzle

Sun’s Surface

Intel Developer Forum, Spring 2004 - Pat

Gelsinger

Many-core Peak Parallel GOPs

Single Threaded

Perf

10% per year

To Grow, To Keep Up,

We Must Embrace Parallel Computing

GOPS

32,768

2,048

128

16

2004 2006 2008 2010 2012

2015

Today’s Architecture: Heat becoming an unmanageable problem!

Parallelism Opportunity

80X

Slide7

Problem statement

Shared mutable state

Needs synchronization primitives

Locks are problematic

Risk for contention

Poor discoverability (

SyncRoot

anyone?)

Not

composable

Difficult to get right (deadlocks, etc.)

Coarse-grained concurrency

Threads well-suited for large units of work

Expensive context switching

Asynchronous programming

Slide8

What can go wrong?

RacesDeadlocksLivelocksLock convoysCache coherencyOverheadsLost event notificationsBroken serializabilityPriority inversion

Slide9

Microsoft Parallel Computing Initiative

VB

C

#

F#

Constructing Parallel Applications

Executing fine-grain Parallel Applications

Coordinating system resources/services

Slide10

Agenda

The concurrency landscape

Language headaches

.NET 4.0 facilities

Task Parallel Library

PLINQ

Coordination Data Structures

Asynchronous programming

Incubation projects

Summary

Slide11

Languages: two extremes

LISP heritage

(Haskell, ML)

No mutable state

Mutable state

Fortran heritage(C, C++, C#, VB)

Fundamentalist

functional programming

F#

Slide12

Mutability

Mutable by default (C# et al)Immutable by default (F# et al)

int x = 5;// Share out xx++;

let x = 5// Share out x// Can’t mutate x

let mutable x = 5// Share out xx <- x + 1

Synchronization required

No locking required

Explicit

opt-in

Slide13

Side-effects will kill you

Elimination of common sub-expressions?Runtime out of controlCan’t optimize codeTypes don’t reveal side-effectsHaskell concept of IO monadDid you know? LINQ is a monad!

Source:

www.cse.chalmers.se

let now = DateTime.Nowin (now, now)

(DateTime.Now, DateTime.Now)

static

DateTime

Now { get; }

Slide14

T

IO<T> - Promote (Return)

Monads for dummies

IO T

Slide15

IO<T>

- Combine (Bind)

T

Monads for dummies

Source:

www.arcanux.org

IO T

IO R

IO R

IEnumerable

<R>

SelectMany

(

IEnumerable

<T>,

Func

<T,

IEnumerable

<R>>)

Slide16

Languages: two roadmaps?

Making C# betterAdd safety nets?ImmutabilityPurity constructsLinear typesSoftware Transactional MemoryKamikaze-style of concurrencySimplify common patternsMaking Haskell mainstreamJust right? Too academic?Not a smooth upgrade path?

C#

Haskell

Nirvana

Slide17

Taming side-effects in F#

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Slide18

Agenda

The concurrency landscape

Language headaches

.NET 4.0 facilities

Task Parallel Library

PLINQ

Coordination Data Structures

Asynchronous programming

Incubation projects

Summary

Slide19

Parallel Extensions Architecture

.NET Program

Proc 1

PLINQ Execution Engine

C# Compiler

VB Compiler

C++ Compiler

IL

OS Scheduling Primitives

(also UMS in Windows 7 and up)

Declarative

Queries

Data Partitioning

Chunk

Range

Hash

Striped

Repartitioning

Operator Types

Map

Scan

Build

Search

Reduction

Merging

Async

(pipeline)SynchOrder PreservingSortingForAll

Proc p

Parallel Algorithms

Query Analysis

Task Parallel Library (TPL)

Coordination Data Structures

Thread-safe Collections

Synchronization Types

Coordination Types

Task APIs

Task Parallelism

Futures

Scheduling

PLINQ

TPL or CDS

F# Compiler

Other .NET Compiler

Slide20

Task Parallel Library – Tasks

System.Threading.TasksTaskParent-child relationshipsExplicit groupingWaiting and cancelationTask<T>Tasks that produce valuesAlso known as futures

Slide21

Work Stealing

Internally, the runtime usesWork stealing techniquesLock-free concurrent task queuesWork stealing has provablyGood localityWork distribution properties

p1

p2

p3

4

3

2

1

4

Slide22

Example code to parallelize

void

MultiplyMatrices

(

int

size,

double

[,] m1,

double

[,] m2,

double

[,] result)

{

for

(

int

i

= 0;

i

< size;

i

++) {

for

(

int

j = 0; j < size; j++) {

result[

i

, j] = 0;

for

(

int

k = 0; k < size; k++) {

result[i, j] += m1[i, k] * m2[k, j];

}

}

}

}

Slide23

Solution today

int N = size; int P = 2 * Environment.ProcessorCount; int Chunk = N / P; // size of a work chunk ManualResetEvent signal = new ManualResetEvent(false); int counter = P; // counter limits kernel transitions for (int c = 0; c < P; c++) { // for each chunk ThreadPool.QueueUserWorkItem(o => { int lc = (int)o; for (int i = lc * Chunk; // process one chunk i < (lc + 1 == P ? N : (lc + 1) * Chunk); // respect upper bound i++) { // original loop body for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } } } if (Interlocked.Decrement(ref counter) == 0) { // efficient interlocked ops signal.Set(); // and kernel transition only when done } }, c); } signal.WaitOne();

Error Prone

High Overhead

Tricks

Static Work Distribution

Knowledge of Synchronization Primitives

Heavy Synchronization

Lack of Thread Reuse

Slide24

Solution with Parallel Extensions

void MultiplyMatrices(int size, double[,] m1, double[,] m2, double[,] result){ Parallel.For (0, size, i => { for (int j = 0; j < size; j++) { result[i, j] = 0; for (int k = 0; k < size; k++) { result[i, j] += m1[i, k] * m2[k, j]; } } });}

Structured parallelism

Slide25

Task Parallel Library – Loops

Common source of work in programsSystem.Threading.Parallel classParallelism when iterations are independentBody doesn’t depend on mutable state E.g. static variables, writing to local variables used in subsequent iterationsSynchronousAll iterations finish, regularly or exceptionally

for (

int

i = 0; i < n; i++) work(i);…foreach (T e in data) work(e);

Parallel.For

(0, n,

i => work(i));…Parallel.ForEach(data, e => work(e));

Why immutability gains attention

Slide26

Task Parallel Library

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Slide27

Amdahl’s law

Maximum speedup:Sk – speed-up factor for portion kPk – percentage of instructions inpart k that can parallelizedSimplified:P – percentage of instructions that can be parallelizedN – number of processors

Sky is

not

the limit

Slide28

Amdahl’s law by example

Theoretical

maximum speedup determined by amount of linear code

Slide29

Performance Tips

Compute intensive and/or large data sets

Work done should be at least 1,000s of cycles

Do not be gratuitous in task creation

Lightweight, but still requires object allocation, etc.

Parallelize only outer loops where possible

Unless N is insufficiently large to offer enough parallelism

Prefer

isolation & immutability

over synchronization

Synchronization == !Scalable

Try to

avoid shared data

Have realistic expectations

Amdahl’s Law

Speedup will be fundamentally limited by the

amount of sequential computation

Gustafson’s Law

But what if you add

more data

, thus increasing the parallelizable percentage of the application?

Slide30

Enable LINQ developers to leverage parallel hardwareFully supports all .NET Standard Query OperatorsAbstracts away the hard work of using parallelismPartitions and merges data intelligently (classic data parallelism)Minimal impact to existing LINQ programming modelAsParallel extension methodOptional preservation of input ordering (AsOrdered)Query syntax enables runtime to auto-parallelizeAutomatic way to generate more Tasks, like ParallelGraph analysis determines how to do itVery little synchronization internally: highly efficient

Parallel LINQ (PLINQ)

var

q = from p in people

        where p.Name == queryInfo.Name && p.State == queryInfo.State && p.Year >= yearStart && p.Year <= yearEnd        orderby p.Year ascending        select p;

.AsParallel()

Slide31

PLINQ

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Slide32

Coordination Data Structures

New synchronization primitives

(

System.Threading

)

Barrier

Multi-phased algorithm

Tasks signal and wait for phases

CountdownEvent

Has an initial counter value

Gets signaled when count reaches zero

LazyInitializer

Lazy initialization routines

Reference type variable gets initialized lazily

SemaphoreSlim

Slim brother to Semaphore (goes kernel mode)

SpinLock

,

SpinWait

Loop-based wait (“spinning”)

Avoids context switch or kernel mode transition

Slide33

Coordination Data Structures

Concurrent collections

(

System.Collections.Concurrent

)

BlockingCollection

<T

>

Producer/consumer scenarios

Blocks when no data is available (consumer)

Blocks when no space is available (producer)

ConcurrentBag

<T

>

ConcurrentDictionary

<

TKey

,

TElement

>

ConcurrentQueue

<T>,

ConcurrentStack

<T

>

Thread-safe and scalable collections

As lock-free as possible

Partitioner

<T

>

Facilities to partition data in chunks

E.g. PLINQ partitioning problems

Slide34

Coordination Data Structures

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Slide35

Asynchronous workflows in F#

Language feature unique to F#Based on theory of monadsBut much more exhaustive compared to LINQ…Overloadable meaning for specific keywordsContinuation passing styleNot: ‘a -> ‘bBut: ‘a -> (‘b -> unit) -> unitIn C# style: Action<T, Action<R>>Core concept: async { /* code */ }Syntactic sugar for keywords inside blockE.g. let!, do!, use!

Function

takes computation

result

Slide36

Asynchronous workflows in F#

let processAsync i = async { use stream = File.OpenRead(sprintf "Image%d.tmp" i) let! pixels = stream.AsyncRead(numPixels) let pixels' = transform pixels i use out = File.OpenWrite(sprintf "Image%d.done" i) do! out.AsyncWrite(pixels') }let processAsyncDemo = printfn "async demo..." let tasks = [ for i in 1 .. numImages -> processAsync i ] Async.RunSynchronously (Async.Parallel tasks) |> ignore printfn "Done!"

Run tasks in parallel

stream.Read(numPixels, pixels -> let pixels' = transform pixels i use out = File.OpenWrite(sprintf "Image%d.done" i) do! out.AsyncWrite(pixels'))

Slide37

Asynchronous workflows in F#

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Slide38

Reactive Fx

First-class events in .NET

Dualism of

IEnumerable

<T> interface

IObservable

<T>

Pull versus

push

Pull (active):

IEnumerable

<T> and

foreach

Push (passive): raise events and event handlers

Events based on functions

Composition at its best

Definition of operators:

LINQ to Events

Realization of the

continuation monad

Slide39

IObservable<T> and IObserver<T>

// Dual of IEnumerable<out T>public interface IObservable<out T>{ IDisposable Subscribe(IObserver<T> observer);}// Dual of IEnumerator<out T>public interface IObserver<in T>{ // IEnumerator<T>.MoveNext return value void OnCompleted(); // IEnumerator<T>.MoveNext exceptional return void OnError(Exception error); // IEnumerator<T>.Current property void OnNext(T value);}

Way to unsubscribe

Signaling the last event

Virtually two return types

Contra-variance

Co-variance

Slide40

ReactiveFx

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Visit channel9.msdn.com for info

Slide41

Agenda

The concurrency landscape

Language headaches

.NET 4.0 facilities

Task Parallel Library

PLINQ

Coordination Data Structures

Asynchronous programming

Incubation projects

Summary

Slide42

DevLabs

project (previously “Maestro”)Coordination between components“Disciplined sharing”Actor modelAgents communicate via messagesChannels to exchange data via portsLanguage features (based on C#)Declarative data pipelines and protocolsSide-effect-free functionsAsynchronous methodsIsolated methodsAlso suitable in distributed setting

Slide43

Channels for message exchange

agent

Program

:

channel

Microsoft.Axum.

Application

{

public

Program()

{

string

[]

args

=

receive

(

PrimaryChannel

::

CommandLine

);

PrimaryChannel

::

ExitCode

<--

0;

}

}

Slide44

Agents and channels

channel Adder{ input int Num1; input int Num2; output int Sum; } agent AdderAgent : channel Adder { public AdderAgent() { int result = receive(PrimaryChannel::Num1) + receive(PrimaryChannel::Num2); PrimaryChannel::Sum <-- result; } }

Send / receive

primitives

Slide45

Protocols

channel Adder{ input int Num1; input int Num2; output int Sum; Start: { Num1 -> GotNum1; } GotNum1: { Num2 -> GotNum2; } GotNum2: { Sum -> End; } }

State transition

diagram

Slide46

Use of pipelines

agent MainAgent : channel Microsoft.Axum.Application { function int Fibonacci(int n) { if (n <= 1) return n; return Fibonacci(n - 1) + Fibonacci(n - 2); } int c = 10; void ProcessResult(int n) { Console.WriteLine(n); if (--c == 0) PrimaryChannel::ExitCode <-- 0; } public MainAgent() { var nums = new OrderedInteractionPoint<int>(); nums ==> Fibonacci ==> ProcessResult; for (int i = 0; i < c; i++) nums <-- 42 - i; }}

Description of data flow

Mathematical

function

Slide47

Domains

domain Chatroom { private string m_Topic; private int m_UserCount; reader agent User : channel UserCommunication { // ... } writer agent Administrator : channel AdminCommunication { // ... } }

Unit of

sharing

between agents

Slide48

Asynchronous methods

private asynchronous void ReadFile(string path) { Stream stream = new Stream(...); int numRead = stream.Read(...); while (numRead > 0) { ... numRead = stream.Read(...); } }

Blocking

operations inside

Slide49

Axum in a nutshell

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Slide50

Another

DevLabs projectCutting edge, released 7/28Specialized fork from .NET 4.0 Beta 1CLR modifications requiredFirst-class transactions on memoryAs an alternative to locking“Optimistic” concurrency methodologyMake modificationsRollback changes on conflictCore concept: atomic { /* code */ }

Slide51

Transactional memory

Subtle differenceProblems with locks:Potential for deadlocks……and more uglinessGranularity matters a lotDon’t compose well

atomic { m_x++; m_y--; throw new MyException() }

lock (GlobalStmLock) { m_x++; m_y--; throw new MyException() }

Slide52

Bank account sample

public

static void

Transfer(

BankAccount

from,

BankAccount

backup,

BankAccount

to,

int

amount)

{

Atomic

.Do

(() =>

{

// Be optimistic, credit the beneficiary first

to.ModifyBalance

(amount);

// Find the appropriate funds in source accounts

try

{

from.ModifyBalance

(-amount);

}

catch

(

OverdraftException

)

{

backup.ModifyBalance

(-amount);

}

});

}

Slide53

Atomic cell update

public class SingleCellQueue<T> where T : class { T m_item; public void T Get() { atomic { T temp = m_item; if (temp == null) retry; m_item = null; return temp; } } public void T Put(T item) { atomic { if (m_item != null) retry; m_item = item; } } }

Don’t

forget

Slide54

The hard truth about STM

Great features

A

C

I

D

Optimistic concurrency

Transparent rollback and re-execute

System.Transactions

(LTM) and DTC support

Implementation

Instrumentation of shared state access

JIT compiler modification

No hardware support currently

Result:

2x to 7x

serial slowdown

(in alpha prototype)

But

improved parallel scalability

Slide55

STM.NET

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Visit msdn.microsoft.com/

devlabs

Slide56

DryadLINQ

Dryad

Infrastructure for

cluster computation

Concept of

job

DryadLINQ

LINQ over Dryad

Decomposition of query

Distribution

over computation nodes

Roughly similar to PLINQ

A la “map-reduce”

Declarative

approach works

Slide57

DryadLINQ = LINQ + Dryad

C#

C#

C#

C#

Vertex

code

Query

plan

(Dryad job)

Data

collection

results

Collection<T> collection;

bool

IsLegal

(Key k);

string Hash(Key);

var

results = from c in collection

where

IsLegal

(

c.key

)

select new { Hash(

c.key

),

c.value

};

Slide58

DryadLINQ

Bart J.F. De SmetSoftware Development EngineerMicrosoft Corporation

demo

Visit research.microsoft.com/dryad

Slide59

Agenda

The concurrency landscape

Language headaches

.NET 4.0 facilities

Task Parallel Library

PLINQ

Coordination Data Structures

Asynchronous programming

Incubation projects

Summary

Slide60

Summary

Parallel programming requires thinking

Avoid side-effects

Prefer immutability

Act 1 = Library approach in .NET 4.0

Task Parallel Library

Parallel LINQ

Coordination Data Structures

Asynchronous patterns (+ a bit of language sugar)

Act 2 = Different

approaches are lurking

Software Transactional Memory

Purification of

languages

Slide61

question & answer

Slide62

www.microsoft.com/teched

Sessions On-Demand & Community

http://microsoft.com/technet

Resources for IT Professionals

http://microsoft.com/msdn

Resources for Developers

www.microsoft.com/learning

Microsoft Certification & Training Resources

Resources

Slide63

Related Content

Breakout Sessions (session codes and titles)

Interactive Theater Sessions (session codes and titles)

Hands-on Labs (session codes and titles)

Hands-on Labs (session codes and titles)

Slide64

Track Resources

Resource 1ki

Resource 2

Resource 3

Resource

4

Slide65

Complete an evaluation on

CommNet

and enter to win!

Required Slide

Slide66

©

2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Required Slide