/
Collaborative Computing for Heterogeneous Integrated Systems Collaborative Computing for Heterogeneous Integrated Systems

Collaborative Computing for Heterogeneous Integrated Systems - PowerPoint Presentation

matterguy
matterguy . @matterguy
Follow
347 views
Uploaded On 2020-07-02

Collaborative Computing for Heterogeneous Integrated Systems - PPT Presentation

LiWen Chang a Juan GómezLuna b Izzat El Hajj a Sitao Huang a Deming Chen a and WenMei W Hwu a a University of Illinois at UrbanaChampaign ID: 792574

international performance mez acm performance international acm mez grained spec conference engineering april luna8th aquila collaborative tasks fine task

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Collaborative Computing for Heterogeneou..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Collaborative Computing for Heterogeneous Integrated Systems

Li-Wen Chang(a), Juan Gómez-Luna(b), Izzat El Hajj(a), Sitao Huang(a), Deming Chen(a), and Wen-Mei W. Hwu(a)(a)University of Illinois at Urbana-Champaign(b)Universidad de Córdoba

8

th

ACM/SPEC International Conference on Performance Engineering

Slide2

Collaborative Computing

Performance and energy efficiencyTraditionally, CPUs + Accelerator (GPU, FPGA…)Recent programming frameworksCUDA8.0, OpenCL2.0New heterogeneous featuresShared virtual memory, coherence, and system-wide atomicsTighter integration allows fine-grain collaborationWe envision integrated systems including CPUs, GPUs and/or FPGAsNew high-level programming languages will synthesize kernels with different collaborative patterns2Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017

Slide3

Integrated Heterogeneous Systems

Our vision of an integrated heterogeneous system3Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017CPU core 0CPU core 1CPU core N-1

L1

L1

L1

L2

FPGA

DMA

Scratchpad

Coherent interconnect

LLC

DRAM controller

DRAM

DRAM

DRAM

DRAM

Crossbar

Coherent bus

Non-coherent bus

L2

CU

L1

CU

L1

CU

L1

CU

L1

CU

L1

GPU

Slide4

Collaborative Patterns

4Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017

d

ata-parallel tasks

s

equential sub-tasks

c

oarse-grained synchronization

Program Structure

Data Partitioning

Device 1

Device 2

Slide5

Collaborative Patterns

5Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017

d

ata-parallel tasks

s

equential sub-tasks

c

oarse-grained synchronization

Program Structure

Fine-grained Task Partitioning

Device 1

Device 2

Slide6

Collaborative Patterns

6Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017

d

ata-parallel tasks

s

equential sub-tasks

c

oarse-grained synchronization

Program Structure

Device 1

Device 2

Coarse-grained Task Partitioning

Slide7

Chai Benchmark Suite

These collaboration patterns can be found in Chai benchmark suite8 data partitioning3 fine-grained task partitioning3 coarse-grained task partitioning7Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017https://chai-benchmarks.github.ioJ. Gómez-Luna, I. El Hajj, L.-W. Chang, V. Garcia-Flores, S. Garcia de Gonzalo, T. Jablin, A. J. Peña, W.-M. Hwu. Chai: Collaborative Heterogeneous Applications for Integrated-architectures. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2017

Slide8

Preliminary Evaluation

Canny Edge Detection (CED) and RANSAC (RSC)CEDD (data part.) and CEDT (coarse-grained part.)RSCD (data part.) and RSCT (fine-grained part.)8Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017CPU+GPU (AMD Kaveri)

CPU+FPGA (

Xeon+StratixV

)

Slide9

Programming Interface

Conversion between collaboration strategies is a code transformationThe best strategy across systems is a performance portability problemLimitation of current practiceOpenCL requires programmers express the collaboration strategy explicitlyLack of flexibilityConversion between strategies is challengingTo fine-grained task partitioning: Partitioning (fission) point, Kernel fissionFrom fine-grained task partitioning: Sophisticated Loop transform, Kernel fusion9Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017

Slide10

Programming Interface

High-level programming languagesE.g. TANGRAM provides performance portabilityCollaboration strategies as optimizations in kernel synthesisTANGRAM’s atomic codelets are fine-grain sub-tasksNo need to identify fission pointsCoarse-grain tasks synthesized as kernelsExtension of TANGRAM’s map and partition for data partitioningFor FPGAs HLS will be introduced as a backend10Juan Gómez Luna8th ACM/SPEC International Conference on Performance Engineering. L’Aquila, April 26, 2017Li-Wen Chang, Izzat El Hajj, Christopher Rodrigues, Juan Gómez-Luna, Wen-mei Hwu.Efficient Kernel Synthesis for Performance Portable ProgrammingIn Proceedings of IEEE/ACM International Symposium on Microarchitecture (MICRO), 2016

Slide11

Collaborative Computing for Heterogeneous Integrated Systems

Thanks!8th ACM/SPEC International Conference on Performance Engineering

https://chai-

benchmarks.github.io