/
Building High Performance Parallel Building High Performance Parallel

Building High Performance Parallel - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
417 views
Uploaded On 2016-05-25

Building High Performance Parallel - PPT Presentation

Software Steve Teixeira Product Unit Manager Microsoft Corporation Session Code DEV401 My Assumptions About You You have some knowledge of parallel development issues Youd like to focus on the new stuff Visual Studio 2010 and NET 40 ID: 334767

task parallel microsoft data parallel task data microsoft thread performance amp code concurrency visual frominclusive shared toexclusive quicksort factory

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Building High Performance Parallel" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1
Slide2

Building High Performance Parallel Software

Steve Teixeira

Product Unit Manager

Microsoft Corporation

Session Code

: DEV401Slide3

My Assumptions About YouYou have some knowledge of parallel development issuesYou’d like to focus on the new stuff: Visual Studio 2010 and .NET 4.0You’re interested in both native and managed code

You’d like to spend some time in the VS Profiler’s concurrency visualizer

You’re game for a covering a lot of ground in a short amount of time!Slide4

Last Year at TechEd Europe…

Steve

Blue Screen

of Death

Audience finds

this hilariousSlide5

Multicore is the new normalHeat and power walls have caused sequential CPU performance to plateauExpect steadily increasing core counts

This is both a

disruption

and an

opportunity

for software developers

A new burden on developers to parallelizeBut new experiences are now possibleParallel computing unlocks new performance benefits and new experiencesParallel == performance (if you’re doing it right )Visual Studio 2010 includes loveable

parallel computingWhy We're All HereSlide6

AgendaMotivationParallelism and Visual Studio 2010Embarrassingly Pleasantly parallel loops

Tasks

Optimizing

PLINQ

Managing

shared stateThinking parallelAvoiding shared state

Scaling outSlide7

Visual Studio 2010Tools, Programming Models, Runtimes

Parallel Pattern Library

Resource Manager

Task Scheduler

Task Parallel

Library

Parallel LINQ

Managed

Native

Key:

Threads

Operating System

Concurrency Runtime

Programming Models

ThreadPool

Task Scheduler

Resource Manager

Data Structures

Data Structures

Tools

Tooling

Parallel

Debugger

Profiler Concurrency

Analysis

Agents

Library

UMS Threads

.NET Framework 4

Visual C++ 10

Windows

MPI Debugger

Visual

Studio

IDESlide8

Before You Parallelize…For most problems, start sequentialBut plan for parallelizationEstablish performance goals

Measure baseline performance against goals

Measure against realistic scenarios and data

Identify performance bottlenecks

Algorithms? I/O? Lack of

asynchronicity

? Data volume?Know that parallelization creates complexityShared state, non-determinism, more code, etc.Slide9

Embarrassingly Pleasantly Parallel LoopsUsing threads, thread pool, parallel for, and parallel foreach

demoSlide10

Parallel loopsThreads

Poor

thread reuse; threads expensive

No load balancing / likely oversubscription

Scary

code

Parallel ForGood thread reuseLoad balancing & dynamic thread managementLoveable code, including exceptions, cancellation, thread-local state, cancellations

Thread Pool

Good

thread reuse

No load balancing; static

partitioning

Scary code

Parallel

ForEach

Custom partitioning

“Custom” index variablesSlide11

What’s Hot This Season: TasksThreads represent execution flow, not workHard-coded; Little latent parallelismUnproductive, un-integrated syntax

QueueUserWorkItem

() is handy for fire-and-forget

But what about…

Waiting

CancelingContinuing

ComposingExceptionsDataflowIntegrationDebuggingSlide12

Tasks in .NET and C++.NET 4.0Parallel.For(x, y, λ

)

Parallel.ForEach

(

IEnum

, λ)

Parallel.Invoke(λ, λ)TaskTask.Factory.StartNew(λ)ThreadPool-based

Visual C++ 10parallel_for(x, y, step λ);parallel_for_each(it, λ)parallel_invoke(λ, λ)task_group / task_handletask_group::run (λ

)Native concurrency runtime(and many overloads for the above)Slide13

CPU0

CPU1

CPU2

CPU3

Static Scheduling

Load-Balancing of Tasks

Dynamic

scheduling improves performance by distributing work efficiently

at runtime.

CPU0

CPU1

CPU2

CPU3

Dynamic

SchedulingSlide14

Program Thread

Work-Stealing Scheduler

Lock-Free

Global Queue

Local

Work-Stealing Queue

Local Work-Stealing Queue

Worker Thread 1

Worker Thread p

Task 1

Task 2

Task 3

Task 5

Task 4

Task 6

Thread Management:

Starvation Detection

Idle Thread Retirement

Hill-climbingSlide15

Visualizing ConcurrencyParallel loops in the VS 2010 Profiler’s Concurrency VisualizerOptimizing PLINQdemoSlide16

Free Code!Scenarios:http://code.msdn.microsoft.com/ScenarioParallel Extensions

Extras:

http://

code.msdn.microsoft.com/ParExtSamplesSlide17

Partitioning: Algorithms

Several partitioning schemes built-in

Chunk

Works with any

IEnumerable

<T>

Single enumerator shared; chunks handed out on-demand

Range

Works only with

IList

<T>

Input divided into contiguous regions, one per partition

Stripe

Works only with

IList

<T>

Elements handed out round-robin to each partition

Hash

Works with any

IEnumerable

<T>

Elements assigned to partition based on hash code

Custom partitioning available through

Partitioner

<T>

Partitioner.Create

available for tighter control over built-in partitioning schemesSlide18

Performance TipsCompute intensive and/or large data setsWork done should be at least 1,000s of cycles

Use the Visual Studio concurrency visualizer

Look for common anti-patterns: load imbalance, lock convoys, etc.

Parallelize fine-grained but not too fine-grained

e.g. Parallelize outer loop,

u

nless N is insufficiently large to offer enough parallelismConsider parallelizing only inner or both, or unrollingDo not be gratuitous in task creationLightweight, but still requires object allocation, etc.Prefer isolation & immutability over synchronizationSynchronization => !Scalable; avoid shared state!Slide19

Locks: Your FrienemyIncludes critical sections, mutexes, Monitors, etc.Necessary to safely manage access to shared state between threads

Contention can be a source of performance problems

Often a source of scalability problems

FACT

: Locks do not contend with Chuck NorrisSlide20

Identifying lock contentionMatrix multiplication, Parallel DebuggerdemoSlide21

Think ParallelCommon patterns to apply parallelismTree walkingSortingInitialization

Speculation

I/OSlide22

Think parallel: Walking a treestatic void Walk<T>(Tree<T> root, Action<T> action){ if (root == null) return;

var

t1 =

Task.Factory.StartNew

(() => action(

root.Data), TaskCreationOptions.AttachedToParent

); var t2 = Task.Factory.StartNew(() => Walk(root.Left, action), TaskCreationOptions.AttachedToParent); var t3 = Task.Factory.StartNew(() => Walk(root.Right

, action), TaskCreationOptions.AttachedToParent); Task.WaitAll(t1, t2, t3);}Slide23

Think parallel: Modifying QuickSortstatic void QuickSort<T>(T[] data, int

fromInclusive

,

int

toExclusive) where T : IComparable

<T>{ if (toExclusive - fromInclusive <= THRESHOLD) { InsertionSort(data, fromInclusive, toExclusive); } else

{ int pivotPos = Partition(data, fromInclusive, toExclusive); if (toExclusive - fromInclusive <= PARALLEL_THRESHOLD) { // NOTE: PARALLEL_THRESHOLD is chosen to be greater than THRESHOLD. QuickSort(data, fromInclusive, pivotPos);

QuickSort(data, pivotPos, toExclusive); } else Parallel.Invoke( () => QuickSort(data, fromInclusive, pivotPos), () => QuickSort(data, pivotPos, toExclusive)); }

}Slide24

Think parallel: Lazy InitializationLazy<T> data = new Lazy<T>(Compute);Task<T> data = Task<T>.Factory.StartNew

(Compute);

Lazy<Task<T>> data = new Lazy<Task<T>>(

() => Task<T>.

Factory.StartNew

(Compute));

data.Value.ContinueWith(t =>{ T result = t.Result; UseResult(result);});Slide25

Think Parallel: Speculationpublic static T SpeculativeInvoke<T>(params

Func

<T>[] functions)

{

return

SpeculativeForEach(functions, function => function());}

public static TResult SpeculativeForEach<TSource, TResult>( IEnumerable<TSource> source, Func

<TSource, TResult> body){ object result = null; Parallel.ForEach(source, (item, loopState) => { result = body(item); loopState.Stop(); }); return (TResult)result;}Slide26

Thinking parallelAsynchronous network I/O and false sharingdemoSlide27

Dining philosophersSlide28

Avoiding Shared State with Asynchronous AgentsAsynchronous Agent:

a

coarse-grained application component designed for larger computing tasks

Sources and Targets:

participants in message-passing which when connected propagate messages from source to target

Co-operative Send and Receive:

utility functions in the agents library which facilitate message passing and leverage the co-operative Concurrency RuntimeSlide29

Asynchronous Agents LibraryData Flow & Message PassingCore Message Blocksunbounded_buffer

<T>

overwrite_buffer

<T>

single_assignment

<T>

send & receiveCo-operatively send & receive messagestransform & callExecute a function asynchronously when work is receivedchoice & joinWaiting efficiently on a set of message blocksSlide30

Using agents to avoid locks and scaledemoSlide31

question & answerReminder: Drinks in Halls 3 and 4 tonight from 18:15 to 19:30!Slide32

www.microsoft.com/teched

Sessions On-Demand & Community

http://microsoft.com/technet

Resources for IT Professionals

http://microsoft.com/msdn

Resources for Developers

www.microsoft.com/learning

Microsoft Certification & Training Resources

Resources

Required Slide

Speakers,

TechEd 2009 is not producing

a DVD. Please announce that

attendees can

access session

recordings at TechEd Online. Slide33

Related ContentDEV307 – Parallel Computing for Managed DevelopersTuesday, 09:00-10:15

New

York 3 - Hall 7-1a

DEV401 – Building High-Performance Parallel Software

Thursday, 15:15-16:30

Berlin 1 - Hall 7-3a

DEV307 (Repeat) – Parallel Computing for Managed DevelopersFriday, 09:00-10:15Europa 1 - Hall 7-3bAsk-the-Experts Lounge

On hand every day for live demos and stimulating conversationSlide34

Complete an evaluation on

CommNet

and enter to win an Xbox 360 Elite!Slide35

Please join us for theCommunity Drinks this evening

In Halls 3 & 4

from 18:15 – 19:30Slide36

©

2009 Microsoft

Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT

MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Required Slide