/
GPU/CUDA Instrumentation Notes GPU/CUDA Instrumentation Notes

GPU/CUDA Instrumentation Notes - PowerPoint Presentation

frogspyder
frogspyder . @frogspyder
Follow
352 views
Uploaded On 2020-11-06

GPU/CUDA Instrumentation Notes - PPT Presentation

Current Goals Generate stacktraces of GPU executions and associate GPU call chains with CPU call graphs Particular interest on how to determine call chains when inlined GPU functions are used ID: 816292

information gpu cpu call gpu information call cpu cuda format john cubins line dyninst issues binaries functions info chains

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "GPU/CUDA Instrumentation Notes" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

GPU/CUDA Instrumentation Notes

Current Goal(s):

Generate

stacktraces

of GPU executions and associate GPU call chains with CPU call graphs

Particular interest on how to determine call chains when in-lined GPU functions are used

High level Issues (some solved, some unsolved):

Limited information on CUDA kernel execution/binary format

No

PC information from the GPU to sample (none in a tool readable format at least)

No documentation on CUBIN structure within

fatbinaries

(CPU binary w/ GPU code embedded)

Dyninst

could not analysis CUDA portions of binaries

No

Symtab

support for GPU line information

Format oddities of individual CUBINs caused crashes with

Dyninst

(CUDA functions all start on

addr

0 and overlap)

Slide2

GPU/CUDA Instrumentation Notes

Solved issues/current work:

Nvidia

now supplies PC information in a machine readable format

John (M.C.) is now able to translate this PC counter info to Kernel Function/line number (not straight forward).

John (M.C.) was able to sidestep some of the oddities of working with CUBINs preventing their analysis in

CUBINS are offset to give them distinct start address, etc.

HPCStruct

is able to associate PC information to line info for in-lined functions (only –O0)

Remaining unsolved issues:

Dyninst

Symtab

still cannot decipher individual CUBINs

Path forward seen in John (M.C.) approach to deciphering line info.

Support

optimized

CUBIN binaries

(-O1 +)

Cannot decipher fat binaries (CPU + GPU code single binary)

Format information from

Nvidia

would resolve this issue, however it can be reversed engineered if necessary.

How to associate GPU call chains with CPU call graphs

John has an idea on how to do this (hanging call chain off of CPU CG node that called the GPU). Lots of work needs to be done to get this working.