/
CUDA Workshop, Week 4 NVVP, Existing Libraries, Q/A CUDA Workshop, Week 4 NVVP, Existing Libraries, Q/A

CUDA Workshop, Week 4 NVVP, Existing Libraries, Q/A - PowerPoint Presentation

ideassi
ideassi . @ideassi
Follow
343 views
Uploaded On 2020-11-06

CUDA Workshop, Week 4 NVVP, Existing Libraries, Q/A - PPT Presentation

Agenda Text book resources Eclipse Nsight NVIDIA Visual Profiler Available libraries Questions Certificate dispersal Optional Multiple GPUs Wheres PixelWaldo Text Book Resources ID: 816005

libraries cuda gpus gpu cuda libraries gpu gpus multiple library algebra code cula bmp linear nvvp nsight accelerated nvidia

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "CUDA Workshop, Week 4 NVVP, Existing Lib..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CUDA Workshop, Week 4

NVVP, Existing Libraries, Q/A

Slide2

Agenda

Text book / resources

Eclipse

Nsight

, NVIDIA Visual Profiler

Available libraries

Questions

Certificate dispersal

(Optional) Multiple GPUs: Where’s Pixel-Waldo?

Slide3

Text Book / Resources

Text book

Programming Massively Parallel Processors, A Hands on approach

David Kirk, Wen-

mei

Hwu

Slide4

Text Book / Resources

Nvidia

developer zone

Early access to updated drivers / updates

Heavily curated help forum

Requires registration and approval (nearly automated)

developer.nvidia.com

Slide5

Text Book / Resources

US!

We’re pretty passionate about this GPU computing stuff.

Collaboration is cool

If you think you’ve got a problem that can benefit from GPU computation we may have some ideas.

Slide6

Eclipse Nsight

, NVVP

IDE with an Eclipse foundation

CUDA aware syntax highlighting / suggestions / recognition

Hooked into NVVP

Slide7

Eclipse Nsight

, NVVP

Deep profiling of every aspect of GPU execution ( memory bandwidth, branch divergence, bank conflicts, compute / transfer overlap, and more! )

Provides suggestions for optimization

Graphical view of GPU performance

Slide8

Eclipse Nsight

, NVVP

Nsight

and NVVP are available on our

cuda

# machines

Ssh –X <user>@<cuda machine>

Nsight

demo on Week 3 code

Slide9

Available Libraries

Why re-invent the wheel?

There are many GPU enabled tools built on CUDA that are already available

These tools have been extensively tested for efficiency and in most cases will outperform custom solutions

Some require CUDA-like code structure

Slide10

Available Libraries

Linear Algebra,

cuBLAS

CUDA enabled basic linear algebra subroutines

GPU-accelerated version of the complete standard BLAS library

Provided with the CUDA toolkit. Code examples are also provided

Callable from C and Fortran

Slide11

Available Libraries

Linear Algebra,

cuBLAS

Slide12

Available Libraries

Linear Algebra,

cuBLAS

Slide13

Available Libraries

Linear Algebra,

CULA, MAGMA

CULA and MAGMA extend BLAS

CULA (Paid)

CULA-dense: LAPACK and BLAS implementations, solvers, decompositions, basic matrix operations

CULA-sparse: sparse matrix specialized routines, specialized storage structures, iterative methods

MAGMA (Free, BSD) (Fortran Bindings)

LAPACK and BLAS implementations, developed by the same dev.

t

eam as LAPACK.

Slide14

Available Libraries

Linear Algebra, CULA, MAGMA

Slide15

Available Libraries

Linear Algebra, CULA, MAGMA

Slide16

Available Libraries

IMSL Fortran/C Numerical Library

Large

collection of mathematical and statistical

gpu

-accelerated functions

Free evaluation, paid extension

http://

www.roguewave.com

/products/

imsl

-numerical-libraries/

fortran-

library.aspx

Slide17

Available Libraries

Image/Signal Processing: NVIDIA Performance Primitives

1900 Image processing and 600 signal processing algorithms

Free and provided with the CUDA toolkit, code examples included.

Can be used in tandem with visualization libraries like OpenGL, DirectX.

Slide18

Available Libraries

Image/Signal Processing: NVIDIA Performance Primitives

Slide19

Available Libraries

CUDA without the CUDA:

Thrust Library

Thrust is a high level interface to GPU computing.

Offers template-interface access to sort, scan, reduce, etc.

A production tested version is provided with the CUDA toolkit.

Slide20

Available Libraries

CUDA without the CUDA:

Thrust Library

Slide21

Available Libraries

CUDA without the CUDA:

Thrust Library

Slide22

Available Libraries

CUDA without the CUDA:

Thrust Library

Slide23

Available Libraries

Python and CUDA

PyCUDA

Python interface to CUDA functions.

Simply a collection of wrappers, but effective.

NumbaPro

(Paid)

Announced this year at GTC 2013, native CUDA python compiler

Python = 4

th

major

cuda

language

Slide24

Available Libraries

R and CUDA

R+GPU

Package with accelerated alternatives for common R statistical functions

Rpud

/

rpudplus

Package

with accelerated alternatives

for common R statistical

functions

Rcuda

… Package

with accelerated alternatives for common R statistical functions

Slide25

Available Libraries

R and CUDA

Slide26

Questions?

Slide27

Certificate Dispersal

Slide28

Multiple GPUs

Where’s Pixel-Waldo?

Motivation

: Given two images which contain a unique suspect and a number of distinct bystanders, identify the suspect by pairwise comparison.

Slide29

Multiple GPUs

This is hard

We’ll simplify the problem by reducing the targets to

pixel triples.

Slide30

Multiple GPUs

0: upload an image and a list to store targets to each GPU.

GPU0

GPU1

f.bmp

s

.bmp

0 | 0 | 0 | …

0 | 0 | 0 | …

Slide31

Multiple GPUs

1

: Find all positions of potential targets (triples) within each image using both GPUS independently.

GPU0

GPU1

f.bmp

s

.bmp

11

| 143 | 243 | …

3

| 1632 | 54321 | …

Slide32

Multiple GPUs

2: Allow GPU0 to access GPU1 memory, use both images and target lists to compare potential suspects.

GPU1

f.bmp

s

.bmp

11

| 143 | 243 | …

PCI Bus

GPU0

3

| 1632 | 54321 | …

0 | 0

Slide33

Multiple GPUs

3

: Print the positions of the single matching suspect.

f.bmp

11

| 143 | 243 | …

PCI Bus

GPU0

CPU

132

| 629

Slide34

Multiple GPUs

Walk though the source code.

Things to note:

This is un-optimized and known to be inefficient, but the concepts of asynchronous streams, GPU context switching, universal addressing, and peer-to-peer access are covered

Source code requires the

tclap

library to compile appropriately.

Source code will be made available in a

github

repository after the workshop.