Software Steve Teixeira Product Unit Manager Microsoft Corporation Session Code DEV401 My Assumptions About You You have some knowledge of parallel development issues Youd like to focus on the new stuff Visual Studio 2010 and NET 40 ID: 334767
Download Presentation The PPT/PDF document "Building High Performance Parallel" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
Building High Performance Parallel Software
Steve Teixeira
Product Unit Manager
Microsoft Corporation
Session Code
: DEV401Slide3
My Assumptions About YouYou have some knowledge of parallel development issuesYou’d like to focus on the new stuff: Visual Studio 2010 and .NET 4.0You’re interested in both native and managed code
You’d like to spend some time in the VS Profiler’s concurrency visualizer
You’re game for a covering a lot of ground in a short amount of time!Slide4
Last Year at TechEd Europe…
Steve
Blue Screen
of Death
Audience finds
this hilariousSlide5
Multicore is the new normalHeat and power walls have caused sequential CPU performance to plateauExpect steadily increasing core counts
This is both a
disruption
and an
opportunity
for software developers
A new burden on developers to parallelizeBut new experiences are now possibleParallel computing unlocks new performance benefits and new experiencesParallel == performance (if you’re doing it right )Visual Studio 2010 includes loveable
parallel computingWhy We're All HereSlide6
AgendaMotivationParallelism and Visual Studio 2010Embarrassingly Pleasantly parallel loops
Tasks
Optimizing
PLINQ
Managing
shared stateThinking parallelAvoiding shared state
Scaling outSlide7
Visual Studio 2010Tools, Programming Models, Runtimes
Parallel Pattern Library
Resource Manager
Task Scheduler
Task Parallel
Library
Parallel LINQ
Managed
Native
Key:
Threads
Operating System
Concurrency Runtime
Programming Models
ThreadPool
Task Scheduler
Resource Manager
Data Structures
Data Structures
Tools
Tooling
Parallel
Debugger
Profiler Concurrency
Analysis
Agents
Library
UMS Threads
.NET Framework 4
Visual C++ 10
Windows
MPI Debugger
Visual
Studio
IDESlide8
Before You Parallelize…For most problems, start sequentialBut plan for parallelizationEstablish performance goals
Measure baseline performance against goals
Measure against realistic scenarios and data
Identify performance bottlenecks
Algorithms? I/O? Lack of
asynchronicity
? Data volume?Know that parallelization creates complexityShared state, non-determinism, more code, etc.Slide9
Embarrassingly Pleasantly Parallel LoopsUsing threads, thread pool, parallel for, and parallel foreach
demoSlide10
Parallel loopsThreads
Poor
thread reuse; threads expensive
No load balancing / likely oversubscription
Scary
code
Parallel ForGood thread reuseLoad balancing & dynamic thread managementLoveable code, including exceptions, cancellation, thread-local state, cancellations
Thread Pool
Good
thread reuse
No load balancing; static
partitioning
Scary code
Parallel
ForEach
Custom partitioning
“Custom” index variablesSlide11
What’s Hot This Season: TasksThreads represent execution flow, not workHard-coded; Little latent parallelismUnproductive, un-integrated syntax
QueueUserWorkItem
() is handy for fire-and-forget
But what about…
Waiting
CancelingContinuing
ComposingExceptionsDataflowIntegrationDebuggingSlide12
Tasks in .NET and C++.NET 4.0Parallel.For(x, y, λ
)
Parallel.ForEach
(
IEnum
, λ)
Parallel.Invoke(λ, λ)TaskTask.Factory.StartNew(λ)ThreadPool-based
Visual C++ 10parallel_for(x, y, step λ);parallel_for_each(it, λ)parallel_invoke(λ, λ)task_group / task_handletask_group::run (λ
)Native concurrency runtime(and many overloads for the above)Slide13
CPU0
CPU1
CPU2
CPU3
Static Scheduling
Load-Balancing of Tasks
Dynamic
scheduling improves performance by distributing work efficiently
at runtime.
CPU0
CPU1
CPU2
CPU3
Dynamic
SchedulingSlide14
Program Thread
Work-Stealing Scheduler
Lock-Free
Global Queue
Local
Work-Stealing Queue
Local Work-Stealing Queue
Worker Thread 1
Worker Thread p
…
…
Task 1
Task 2
Task 3
Task 5
Task 4
Task 6
Thread Management:
Starvation Detection
Idle Thread Retirement
Hill-climbingSlide15
Visualizing ConcurrencyParallel loops in the VS 2010 Profiler’s Concurrency VisualizerOptimizing PLINQdemoSlide16
Free Code!Scenarios:http://code.msdn.microsoft.com/ScenarioParallel Extensions
Extras:
http://
code.msdn.microsoft.com/ParExtSamplesSlide17
Partitioning: Algorithms
Several partitioning schemes built-in
Chunk
Works with any
IEnumerable
<T>
Single enumerator shared; chunks handed out on-demand
Range
Works only with
IList
<T>
Input divided into contiguous regions, one per partition
Stripe
Works only with
IList
<T>
Elements handed out round-robin to each partition
Hash
Works with any
IEnumerable
<T>
Elements assigned to partition based on hash code
Custom partitioning available through
Partitioner
<T>
Partitioner.Create
available for tighter control over built-in partitioning schemesSlide18
Performance TipsCompute intensive and/or large data setsWork done should be at least 1,000s of cycles
Use the Visual Studio concurrency visualizer
Look for common anti-patterns: load imbalance, lock convoys, etc.
Parallelize fine-grained but not too fine-grained
e.g. Parallelize outer loop,
u
nless N is insufficiently large to offer enough parallelismConsider parallelizing only inner or both, or unrollingDo not be gratuitous in task creationLightweight, but still requires object allocation, etc.Prefer isolation & immutability over synchronizationSynchronization => !Scalable; avoid shared state!Slide19
Locks: Your FrienemyIncludes critical sections, mutexes, Monitors, etc.Necessary to safely manage access to shared state between threads
Contention can be a source of performance problems
Often a source of scalability problems
FACT
: Locks do not contend with Chuck NorrisSlide20
Identifying lock contentionMatrix multiplication, Parallel DebuggerdemoSlide21
Think ParallelCommon patterns to apply parallelismTree walkingSortingInitialization
Speculation
I/OSlide22
Think parallel: Walking a treestatic void Walk<T>(Tree<T> root, Action<T> action){ if (root == null) return;
var
t1 =
Task.Factory.StartNew
(() => action(
root.Data), TaskCreationOptions.AttachedToParent
); var t2 = Task.Factory.StartNew(() => Walk(root.Left, action), TaskCreationOptions.AttachedToParent); var t3 = Task.Factory.StartNew(() => Walk(root.Right
, action), TaskCreationOptions.AttachedToParent); Task.WaitAll(t1, t2, t3);}Slide23
Think parallel: Modifying QuickSortstatic void QuickSort<T>(T[] data, int
fromInclusive
,
int
toExclusive) where T : IComparable
<T>{ if (toExclusive - fromInclusive <= THRESHOLD) { InsertionSort(data, fromInclusive, toExclusive); } else
{ int pivotPos = Partition(data, fromInclusive, toExclusive); if (toExclusive - fromInclusive <= PARALLEL_THRESHOLD) { // NOTE: PARALLEL_THRESHOLD is chosen to be greater than THRESHOLD. QuickSort(data, fromInclusive, pivotPos);
QuickSort(data, pivotPos, toExclusive); } else Parallel.Invoke( () => QuickSort(data, fromInclusive, pivotPos), () => QuickSort(data, pivotPos, toExclusive)); }
}Slide24
Think parallel: Lazy InitializationLazy<T> data = new Lazy<T>(Compute);Task<T> data = Task<T>.Factory.StartNew
(Compute);
Lazy<Task<T>> data = new Lazy<Task<T>>(
() => Task<T>.
Factory.StartNew
(Compute));
data.Value.ContinueWith(t =>{ T result = t.Result; UseResult(result);});Slide25
Think Parallel: Speculationpublic static T SpeculativeInvoke<T>(params
Func
<T>[] functions)
{
return
SpeculativeForEach(functions, function => function());}
public static TResult SpeculativeForEach<TSource, TResult>( IEnumerable<TSource> source, Func
<TSource, TResult> body){ object result = null; Parallel.ForEach(source, (item, loopState) => { result = body(item); loopState.Stop(); }); return (TResult)result;}Slide26
Thinking parallelAsynchronous network I/O and false sharingdemoSlide27
Dining philosophersSlide28
Avoiding Shared State with Asynchronous AgentsAsynchronous Agent:
a
coarse-grained application component designed for larger computing tasks
Sources and Targets:
participants in message-passing which when connected propagate messages from source to target
Co-operative Send and Receive:
utility functions in the agents library which facilitate message passing and leverage the co-operative Concurrency RuntimeSlide29
Asynchronous Agents LibraryData Flow & Message PassingCore Message Blocksunbounded_buffer
<T>
overwrite_buffer
<T>
single_assignment
<T>
send & receiveCo-operatively send & receive messagestransform & callExecute a function asynchronously when work is receivedchoice & joinWaiting efficiently on a set of message blocksSlide30
Using agents to avoid locks and scaledemoSlide31
question & answerReminder: Drinks in Halls 3 and 4 tonight from 18:15 to 19:30!Slide32
www.microsoft.com/teched
Sessions On-Demand & Community
http://microsoft.com/technet
Resources for IT Professionals
http://microsoft.com/msdn
Resources for Developers
www.microsoft.com/learning
Microsoft Certification & Training Resources
Resources
Required Slide
Speakers,
TechEd 2009 is not producing
a DVD. Please announce that
attendees can
access session
recordings at TechEd Online. Slide33
Related ContentDEV307 – Parallel Computing for Managed DevelopersTuesday, 09:00-10:15
New
York 3 - Hall 7-1a
DEV401 – Building High-Performance Parallel Software
Thursday, 15:15-16:30
Berlin 1 - Hall 7-3a
DEV307 (Repeat) – Parallel Computing for Managed DevelopersFriday, 09:00-10:15Europa 1 - Hall 7-3bAsk-the-Experts Lounge
On hand every day for live demos and stimulating conversationSlide34
Complete an evaluation on
CommNet
and enter to win an Xbox 360 Elite!Slide35
Please join us for theCommunity Drinks this evening
In Halls 3 & 4
from 18:15 – 19:30Slide36
©
2009 Microsoft
Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT
MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Required Slide