/
DCompute DCompute

DCompute - PowerPoint Presentation

test
test . @test
Follow
369 views
Uploaded On 2017-12-27

DCompute - PPT Presentation

Native amp Convenient Heterogeneous Computing for D Outline Introduction Compiler Libraries Using DCompute present and future Future directions State of Hardware X86 all compilers ID: 618054

kernels cuda opencl amp cuda kernels amp opencl dcompute kernel ldc compiler compute compilation work stmt llvm time driver

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "DCompute" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

DCompute:

Native & Convenient Heterogeneous

Computing for DSlide2

Outline

Introduction

Compiler

Libraries

Using

DCompute

(present and future)

Future directionsSlide3

State of Hardware

X86

all compilers

ARM

GDC, LDC

MIPS, PPC

LDC

DSPs, FPGAs GPUs - ? Slide4

State of Hardware

DSPs

C,

OpenCL

FPGAs

HDLs (Verilog, VHDL, DHDL?),

OpenCL

GPUs

Shaders

(GLSL, HLSL) for graphics

CUDA,

OpenCL

for computeSlide5

CUDA

NVidia only

CUDA C++/Fortran => PTX => SASS

Well integrated

Reasonably nice to use (from C/C++)Slide6

OpenCL

Many targets - GPUs, DSPs FPGAs

SPIR-V

OpenCL

C/C++ (kernel languages) are OK

API is horribleSlide7

So…

CUDA has vendor lock-in

OpenCL

isn’t very nice

LLVM targets SPIR-V and PTX

We have a D compiler that targets LLVM (LDC)

But...Slide8

Enter DCompute

Compiler

Enables writing kernel in D

Library

Automate using kernels in D

Kernels

Prewritten kernels for heterogeneous acceleration with less effort.Slide9

Compiler

Core functionality done

Work in Progress

Images / Pipes

Globals

Better errors / error reporting

SPIR-V

Optimisations

(not very critical)Slide10

Targeting CUDA & OpenCL

Tacking on Bits of metadata to the modules

Calling convention

Address spacing pointers

Images & other special types (WiP)

Indexing (

get_local_id

/

threadIdx

)

Other “

Instinsics

”Slide11

Compilation Process Models

OpenCL

like Separate compilation

no compile time info

SYCL like hybrid source code

kernel is one lambda long, highly nested, hides a lot of runtime magic

CUDA like hybrid source

runtime magic, semantic validation harder

Want something that fits well with modules & retains compile time informationSlide12

Compilation Process

ldc2

mdcompute

-targets=ocl-220,cuda-620

files.d

module normal;

@compute(

CompileFor.hostAndDevice

) module

shared_code

;

@compute(

CompileFor.deviceOnly

) module

my_kernels

;

@kernel void foo(

GlobalPointer!float

f) {

}

if (__

dcompute_reflect

(

target,version

)) {

}

GlobalPointer!T

-> { T

addrspace

(n)* }

ABI

MetadataSlide13

Codegen conditional compilation

if (

stmt

->condition->op ==

TOKcall

) {

auto

ce

= (

CallExp

*)

stmt

->condition;

if (ce->f && ce->f->ident &&

!strcmp(ce->f->ident->toChars(),

"__dcompute_reflect")) {

if (

match(

ce

->arguments)

) {

stmt->ifbody->accept(this);

else if (

stmt

->

elsebody

)

stmt

->

elsebody

->accept(this);

}

return;

}

}Slide14

Benefits

Done in one compilation (Host, CUDA &

OpenCL

)

No need to worry about templates

Get compile time info on kernelsSlide15

DCompute

Standard Library for compute operations

For use with kernels

Driver

Abstraction over

OpenCl

and CUDA runtimes

Handles Device and Host interactions

Launching kernels

Managing memory

Standard collection of KernelsSlide16

Standard Library

Indexation

Synchronisation

primitives

Vectors (SIMD & geometric)

Math (the usual)

Images (1d,2d,3d + arrays, cubes)

Packing (colour operations)

Atomics

Work Group operations (reduce

)

Backed by

CUDA:

libdevice

+ LLVM PTX intrinsics

OpenCL

: intrinsic operation (Magic)Slide17
Slide18

Driver (WiP)

Allocate & Manage device memory

Data transfer

Kernels: Load, Launch

Device

Synchronisation

EventsSlide19

Driver API Automation

For launching kernels we want something like

@kernel void

my_kernel

(T)(

GlobalPointer!T

p,

int

args

) { ... }

void main(string[]

args

){

auto

dev

=

getDefaultDevice

(

getConfig

());

auto q =

dev.getDefaultQueue

();

float[]

arr

=

someData

();

Buffer!float

b =

dev.makeBuffer

(

arr

);

Event v =

q.enqueue

!(

my_kernel!float

)(

b.length

)(b,42);

v.wait

();

b.read

(

arr

).

writeln

;

}Slide20

struct

Queue {

Call

enqueue

(alias kernel)(

LaunchParams

lp

) {

return Call!(

typeof

(kernel),

kernel.mangleof

)

(

lp

, this);

}

}

struct

Call(

F,string

mangle) {

LaunchParams

lp

;

Queue q;

Event

opCall

(

KernelArgsOf!F

args

) {

//Get type correctness for free!

//use

Parameters!F

to call

clSetKernelArg

/

//

clEnqueueNDRangeKernel

or

cuLaunchKernel

}Slide21

Collection of Kernels

Showcase of how to do things (and how not to)

Functional examples

Covers common use cases

Convolutions (DFT/FFT)

Reductions

Filter, Sort

Thrust-like (except ranges)

Precompilation

(for use with DMD/GDC)Slide22

Future work

Make SPIR-V use intrinsics (LLVM, WiP)

Compiler

Tests (WiP)

Relax

DCompute

constraints

Images & Pipes (integrate with Phobos#2845)

Globals

Std

lib

add missing functions -> intrinsics

Driver

Finish integrating

clWrap

CUDA

High level API over

OpenCL

+ CUDA

Testing framework

Library integration: with e.g.

mir

(

ndslice

GLAS CV),

scid

Kernels: standard algorithms, NEW algorithmsSlide23

Conclusion

DCompute

is a compiler extension of LDC to target

OpenCL

and CUDA (and the host) all at the same time! (working but not feature complete)

Runtime libraries are a work in progress. Building kernels work, demo to follow.

This is D so we are able to make this nice to use (in spite of the horribleness of the underlying APIs) thanks to awesome templates and introspection!

World

domintation

of HPC will be within grasp!Slide24

Acknowledgments

John Colvin

David

Nadlinger

Kai

Nacke

Kinke

Johan

EngelenSlide25

Questions?

LLVM: https://

github.com

/

thewilsonator

/

llvm

/tree/compute

LDC: https://

github.com

/

ldc

-developers/

ldc

/tree/

dcompute

(master

soon

TM

)

DCompute

: https://

github.com

/

libmir

/

dcompute

Related Contents


Next Show more