/
Technical Computing from Domain Analysis to Performance Profiling Technical Computing from Domain Analysis to Performance Profiling

Technical Computing from Domain Analysis to Performance Profiling - PowerPoint Presentation

jane-oiler
jane-oiler . @jane-oiler
Follow
378 views
Uploaded On 2018-09-22

Technical Computing from Domain Analysis to Performance Profiling - PPT Presentation

Phil Pennington Sr Developer Evangelist Microsoft Corporation SESSION CODE WSV325 Required Slide AGENDA TechnicalComputing Parallel Platform Tools Solvers Analysis Technical Computing Microsoft ID: 674775

parallel core tools microsoft core parallel microsoft tools socket computing library threads numa tpl task windows technical vs2010 parallelism

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Technical Computing from Domain Analysis..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Technical Computing from Domain Analysis to Performance Profiling

Phil PenningtonSr. Developer EvangelistMicrosoft Corporation

SESSION CODE: WSV325

Required SlideSlide2

AGENDA

Technical_Computing = Parallel (Platform + Tools + Solvers + Analysis);

Technical Computing @ MicrosoftParallel Tools in Visual StudioThinking ParallelUsing TPL and C# Slide3

An Example

Monte Carlo Approximation of Pi

S = Area of square S = (2*r) * (2*r) = 4*r *rS = 4*r*r

DEMO

C = Area of Circle

C = Pi*r*r

Pi = 4 * (Area of Circle / Area of Square)Slide4

DEMO

S = 4*r*r

C =

Pi*r*r

Pi = 4

*(C/S)

An Example

Monte Carlo Approximation of Pi

For each Point (P),

d(P)

= SQRT

((x * x) + (y * y

))

if (d < r) then (

x,y

) in CSlide5

Why Parallel?Slide6

Windows and Logical Processors

Before Win7/R2, the maximum number of Logical Processors (LPs) was dictated by processor integral word size

LP state (e.g. idle, affinity) represented in word-sized bitmask32-bit Windows: 32 LPs64-bit Windows: 64 LPs

0

16

31

32-bit Idle Processor Mask

Idle

BusySlide7

Windows Organizes Many-Cores via GROUP

New with Windows 7

and R2

GROUP

NUMA NODE

NUMA NODE

Socket

Socket

Core

Core

Core

Core

LP

LP

LP

LP

NUMA = Non-Uniform Memory Access

LP = Logical ProcessorSlide8

Processor Groups

Example: 2 Groups, 4 Nodes, 8 Sockets, 32 Cores, 4 LPs/Core = 128 LPs

Group

NUMA Node

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

NUMA Node

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Group

NUMA Node

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

NUMA Node

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Socket

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LP

Core

LP

LP

LP

LPSlide9

CPU0

CPU1

CPU2

CPU3

Static Scheduling

Load-Balancing Task Scheduler

Dynamic

scheduling improves performance by distributing work efficiently

at runtime.

CPU0

CPU1

CPU2

CPU3

Dynamic

SchedulingSlide10

Your Scheduler

Logic

Reason:

Yield

Wait

Reason:

Yield

User Mode Scheduling

Architectural Perspective

Application

Kernel

S

1

S

2

Scheduler Threads

CPU 1

CPU 2

W

1

W

2

W

3

W

4

Blocked Worker Threads

UMS Scheduler’s

R

eady List

UMS Completion List

Reason:

Created

Reason:

BlockedSlide11

The Platform

Topology

DEMOSlide12

AGENDA

Technical_Computing = Parallel (Platform + Tools + Solvers + Analysis);

Technical Computing @ MicrosoftParallel Tools in Visual StudioThinking ParallelUsing TPL and C# Slide13

Tasks in .NET and C++

.NET 4.0

Parallel.For(x, y, λ)Parallel.ForEach(IEnum

,

λ

)

Parallel.Invoke

(

λ

, λ)TaskTask.Factory.StartNew

(

λ

)

ThreadPool-basedVisual C++ 10parallel_for(x, y, step, λ);parallel_for_each(it, λ)parallel_invoke(λ, λ)task_group / task_handletask_group::run (λ)Native concurrency runtime(and many overloads for the above)Slide14

Tools

Programming Models – Structured Parallelism

.NET Parallel Extensions

.NET Runtime

Visual Studio 2010, .NET Developer

Tools, Programming Models, Runtimes

Parallel LINQ

(PLINQ)

Resource Manager

Task Scheduler

Managed Library

Threads Pools

Data Structures

Tools

Parallel

Debugger

Parallel

Profiler

Task Parallel

LibrarySlide15

Tools

Programming Models – Structured Parallelism

C++ Concurrency Runtime

Operating System

Visual Studio 2010, C++ Developer

Tools, Programming Models, Runtimes

Parallel Pattern Library

Resource Manager

Task Scheduler

Native Library

Key:

Threads

Data Structures

Tools

Parallel Debugger

Parallel

Profiler

Agents

Library

Win7/R2: UMS ThreadsSlide16

Capabilities Comparison (1)

VS2010

PCP Technologies

Capability

.NET4

TPL

C++

ConcRT

OpenMP

PLINQ

MS

MPI

Threads/Thread-Pools

Task Parallelism

Y

Y

N+

Y-

N

N

Data

Parallelism

Y

Y

N+

Y

Y

N

Parallel Patterns

Y

Y

N

Y

N+

N

Fine-grained

Parallelism (loops)

Y

Y

Y

Y-

N

N

Work-Item

Partitioning

Y

Y

Y-

Y

N+

N

Dynamic Scheduling

Y

Y

N

Y

N+

NSlide17

Capabilities Comparison (2)

VS2010

PCP Technologies

Capability

.NET4

TPL

C++

ConcRT

OpenMP

PLINQ

MS

MPI

Threads/Thread-Pools

Affinity

N

N

Y

N

Y-

Y

Concurrent

Data Structures

Y

Y

N+

Y

N

N

Scalable

Memory Allocator

Y

Y

N

Y

N

N

Optimized

I/O Capability

N

N

N

N

Y

Y

User-Mode

Sync Primitives

Y

Y

Y

Y

Y

N

Automatically

Collates Results

N

N

N

Y

N

NSlide18

The Tools

Libraries

Languages Debuggers

Profilers

DEMOSlide19

AGENDA

Technical_Computing = Parallel (Platform + Tools + Solvers + Analysis);

Technical Computing @ MicrosoftParallel Tools in Visual StudioThinking Parallel

Using TPL and C# Slide20

Thinking Parallel - “Control” vs. “Data”

ParallelismControl Parallelism

Parallel.For (0, size, (i) => { Console.WriteLine(i);

});

Data Parallelism

IEnumerable

<int

> numbers =

Enumerable.Range

(2, 100-3);

var parallelQuery = from n in numbers.AsParallel() where Enumerable.Range(2, (int)Math.Sqrt(n)).All(i => n % i > 0) select n; int[] primes = parallelQuery.ToArray(); Slide21

Thinking Parallel – How to Schedule my Tasks?Slide22

Thinking Parallel – How to Partition my Data?

Several partitioning schemes built-in

Chunk

Works with any

IEnumerable

<T>

Single enumerator shared; chunks handed out on-demand

Range

Works only with

IList

<T>

Input divided into contiguous regions, one per partition

Stripe

Works only with

IList

<T>

Elements handed out round-robin to each partition

Hash

Works with any

IEnumerable

<T>

Elements assigned to partition based on hash code

Custom partitioning available through

Partitioner

<T>

Partitioner.Create

available for tighter control over built-in partitioning schemesSlide23

Thinking Parallel – How to Collate my Results?Slide24

Using TPL and C#

Partition

Execute (i.e. Schedule) Collate

DEMOsSlide25

Track Resources

Managed

APIs/runtimes (.NET 4)Tasks, loops, collections, and PLINQhttp://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspxNative

APIs/runtimes (Visual C++ 10

)

Tasks, loops, collections, and Agents

http://msdn.microsoft.com/en-us/library/dd504870(VS.100).aspx

Tools

(in the VS2010 IDE)

Debugger and profiler

http://msdn.microsoft.com/en-us/library/dd460685(VS.100).

aspx

General VS2010 Parallel Computing Developer Center

http://msdn.microsoft.com/en-us/concurrency/default.aspxSlide26

Related Content

Required Slide

Speakers,

please list the Breakout Sessions, Interactive Sessions, Labs and Demo Stations that are related to your session.

DEV314,

ManyCore

and .NET4 with VS2010

, Mon, 14:45,

Rm

288

ARC205,

Patterns of Parallel Programming

, Tues, 17:00,

Rm

276ARC02-INT, Patterns for Parallel Programming

, Wed,

08:00

,

Rm

346

DEV408,

TPL: Design Principles and Best Practices

, Wed, 11:45,

Rm

283

DEV317,

Profiling and Debugging Parallel Code with VS2010, Thurs, 08:00,

Rm 293DEV307, F# in VS2010

, Thurs, 09:45, Rm 276WSV325, TC from Domain Analysis to Performance Profiling

, Thurs, 17:00, Rm 388Slide27

Resources

Required Slide

www.microsoft.com/teched

Sessions On-Demand & Community

Microsoft Certification & Training Resources

Resources for IT Professionals

Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet

http://microsoft.com/msdn

LearningSlide28

Complete an evaluation on

CommNet

and

enter to win!

Required SlideSlide29

Sign up for Tech·Ed 2011 and save $500

starting June 8 – June 31sthttp://northamerica.msteched.com/registration

 

You can also register at the

North

America 2011

kiosk

located at

registration

Join us in Atlanta next year

Slide30

©

2010 Microsoft

Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.Slide31

Required Slide