Agenda Text book resources Eclipse Nsight NVIDIA Visual Profiler Available libraries Questions Certificate dispersal Optional Multiple GPUs Wheres PixelWaldo Text Book Resources ID: 816005
Download The PPT/PDF document "CUDA Workshop, Week 4 NVVP, Existing Lib..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CUDA Workshop, Week 4
NVVP, Existing Libraries, Q/A
Slide2Agenda
Text book / resources
Eclipse
Nsight
, NVIDIA Visual Profiler
Available libraries
Questions
Certificate dispersal
(Optional) Multiple GPUs: Where’s Pixel-Waldo?
Slide3Text Book / Resources
Text book
Programming Massively Parallel Processors, A Hands on approach
David Kirk, Wen-
mei
Hwu
Slide4Text Book / Resources
Nvidia
developer zone
Early access to updated drivers / updates
Heavily curated help forum
Requires registration and approval (nearly automated)
developer.nvidia.com
Slide5Text Book / Resources
US!
We’re pretty passionate about this GPU computing stuff.
Collaboration is cool
If you think you’ve got a problem that can benefit from GPU computation we may have some ideas.
Slide6Eclipse Nsight
, NVVP
IDE with an Eclipse foundation
CUDA aware syntax highlighting / suggestions / recognition
Hooked into NVVP
Slide7Eclipse Nsight
, NVVP
Deep profiling of every aspect of GPU execution ( memory bandwidth, branch divergence, bank conflicts, compute / transfer overlap, and more! )
Provides suggestions for optimization
Graphical view of GPU performance
Slide8Eclipse Nsight
, NVVP
Nsight
and NVVP are available on our
cuda
# machines
Ssh –X <user>@<cuda machine>
Nsight
demo on Week 3 code
Slide9Available Libraries
Why re-invent the wheel?
There are many GPU enabled tools built on CUDA that are already available
These tools have been extensively tested for efficiency and in most cases will outperform custom solutions
Some require CUDA-like code structure
Slide10Available Libraries
Linear Algebra,
cuBLAS
CUDA enabled basic linear algebra subroutines
GPU-accelerated version of the complete standard BLAS library
Provided with the CUDA toolkit. Code examples are also provided
Callable from C and Fortran
Slide11Available Libraries
Linear Algebra,
cuBLAS
Slide12Available Libraries
Linear Algebra,
cuBLAS
Slide13Available Libraries
Linear Algebra,
CULA, MAGMA
CULA and MAGMA extend BLAS
CULA (Paid)
CULA-dense: LAPACK and BLAS implementations, solvers, decompositions, basic matrix operations
CULA-sparse: sparse matrix specialized routines, specialized storage structures, iterative methods
MAGMA (Free, BSD) (Fortran Bindings)
LAPACK and BLAS implementations, developed by the same dev.
t
eam as LAPACK.
Slide14Available Libraries
Linear Algebra, CULA, MAGMA
Slide15Available Libraries
Linear Algebra, CULA, MAGMA
Slide16Available Libraries
IMSL Fortran/C Numerical Library
Large
collection of mathematical and statistical
gpu
-accelerated functions
Free evaluation, paid extension
http://
www.roguewave.com
/products/
imsl
-numerical-libraries/
fortran-
library.aspx
Slide17Available Libraries
Image/Signal Processing: NVIDIA Performance Primitives
1900 Image processing and 600 signal processing algorithms
Free and provided with the CUDA toolkit, code examples included.
Can be used in tandem with visualization libraries like OpenGL, DirectX.
Slide18Available Libraries
Image/Signal Processing: NVIDIA Performance Primitives
Slide19Available Libraries
CUDA without the CUDA:
Thrust Library
Thrust is a high level interface to GPU computing.
Offers template-interface access to sort, scan, reduce, etc.
A production tested version is provided with the CUDA toolkit.
Slide20Available Libraries
CUDA without the CUDA:
Thrust Library
Slide21Available Libraries
CUDA without the CUDA:
Thrust Library
Slide22Available Libraries
CUDA without the CUDA:
Thrust Library
Slide23Available Libraries
Python and CUDA
PyCUDA
Python interface to CUDA functions.
Simply a collection of wrappers, but effective.
NumbaPro
(Paid)
Announced this year at GTC 2013, native CUDA python compiler
Python = 4
th
major
cuda
language
Slide24Available Libraries
R and CUDA
R+GPU
Package with accelerated alternatives for common R statistical functions
Rpud
/
rpudplus
Package
with accelerated alternatives
for common R statistical
functions
Rcuda
… Package
with accelerated alternatives for common R statistical functions
Slide25Available Libraries
R and CUDA
Slide26Questions?
Slide27Certificate Dispersal
Slide28Multiple GPUs
Where’s Pixel-Waldo?
Motivation
: Given two images which contain a unique suspect and a number of distinct bystanders, identify the suspect by pairwise comparison.
Slide29Multiple GPUs
This is hard
We’ll simplify the problem by reducing the targets to
pixel triples.
Slide30Multiple GPUs
0: upload an image and a list to store targets to each GPU.
GPU0
GPU1
f.bmp
s
.bmp
0 | 0 | 0 | …
0 | 0 | 0 | …
Slide31Multiple GPUs
1
: Find all positions of potential targets (triples) within each image using both GPUS independently.
GPU0
GPU1
f.bmp
s
.bmp
11
| 143 | 243 | …
3
| 1632 | 54321 | …
Slide32Multiple GPUs
2: Allow GPU0 to access GPU1 memory, use both images and target lists to compare potential suspects.
GPU1
f.bmp
s
.bmp
11
| 143 | 243 | …
PCI Bus
GPU0
3
| 1632 | 54321 | …
0 | 0
Slide33Multiple GPUs
3
: Print the positions of the single matching suspect.
f.bmp
11
| 143 | 243 | …
PCI Bus
GPU0
CPU
132
| 629
Slide34Multiple GPUs
Walk though the source code.
Things to note:
This is un-optimized and known to be inefficient, but the concepts of asynchronous streams, GPU context switching, universal addressing, and peer-to-peer access are covered
Source code requires the
tclap
library to compile appropriately.
Source code will be made available in a
github
repository after the workshop.