/
PA0 – Hello CUDA CUDA Compute Unified Device Architecture PA0 – Hello CUDA CUDA Compute Unified Device Architecture

PA0 – Hello CUDA CUDA Compute Unified Device Architecture - PowerPoint Presentation

bubbleba
bubbleba . @bubbleba
Follow
343 views
Uploaded On 2020-08-26

PA0 – Hello CUDA CUDA Compute Unified Device Architecture - PPT Presentation

Defines much more than an API A language Hardware Specifications PA0 Lets look into your first assignment and figure some things out HELLOCUDACU HELLOCUDACU Pointers to GPU land deva ID: 803234

time hellocuda device runs hellocuda time runs device host callable thread gpu dev cuda events global organization block cudamalloc

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "PA0 – Hello CUDA CUDA Compute Unified ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

PA0 – Hello CUDA

Slide2

CUDA

Compute Unified Device Architecture

Defines much more than an API

A language

Hardware Specifications

Slide3

PA0

Let’s look into your first assignment and figure some things out.

Slide4

HELLOCUDA.CU

Slide5

HELLOCUDA.CU

Pointers to GPU land

dev_a

,

dev_b

,

dev_c

cudaMalloc

allocates space on the GPU

Events:

Allow you to monitor completion of kernel runs, time runs, and synchronize execution

In this case it is being used to time the execution

Slide6

HELLOCUDA.CU

Slide7

HELLOCUDA.CU

cudaEventSynchronize

and

cudeEventElapsedTime

Finishing up the timing

Use a timer like this for your homework

DO NOT TIME I/O

DO NOT TIME I/O

DO NOT TIME I/O

Slide8

HELLOCUDA.CU

cudaMemcpy

moves data onto and off of the GPU

cudaMemcpyHostToHost

cudaMemcpyHostToDevice

cudaMemcpyDeviceToHost

cudaMemcpyDeviceToDevice

cudaMemcpyDefault

(Only with Compute 2.0)

Slide9

HELLOCUDA.CU

add<<<N,I>>>(…

args

…)

<<< blocks, threads in each block >>>

There are limits on how many of each can be made.

Slide10

Thread Organization

Thread

Warp

The SIMT group of 32 threads

Block

Grid

Slide11

Thread Organization

Slide12

HELLOCUDA.CU

Slide13

HELLOCUDA.CU

Much like

cudaMalloc

mirrors

malloc

cudaFree

mirrors free

Events should also be destroyed.

Slide14

ADD.CU

Slide15

__GLOBAL__

CUDA signifier

__global__

Host callable, runs on the device

__device__

Device callable, runs on the device

__host__

Host callable, runs on the host