Integration for Heterogeneous SoC Modeling - PowerPoint Presentation

424 views
Uploaded On 2015-11-26

Integration for Heterogeneous SoC Modeling - PPT Presentation

Y Sophia Shao Sam Xi GuYeon Wei David Brooks Harvard University More accelerators OutofCore Accelerators 2 Die photo from Chipworks Accelerators annotated by Sophia Shao Harvard ID: 205562

aladdin accelerator memory design accelerator aladdin design memory acc cache power scratchpad accelerators soc performance rtl integration datapath shared

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/205562" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Integration for Heterogeneous SoC Modeli..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Integration for Heterogeneous SoC Modeling

Y. Sophia Shao, Sam Xi, Gu-Yeon Wei, David BrooksHarvard UniversitySlide2

More accelerators.

Out-of-Core

Accelerators

[Die photo from

Chipworks

]

[Accelerators annotated

Sophia

Shao @ Harvard]

Maltiel

Consulting

estimates

[Shao, et al., IEEE Micro]Slide3

Accelerator-CPU Integration:Today’s Conventional SoCs3

Easy to integrate lots of IP, simple accelerator designHard to program and share dataCoreL2 $…L3 $

Core

L2 $

DMA

On-Chip System Bus

Acc

Scratchpad

Acc

ScratchpadSlide4

Accelerator Integration TrendUsers design application-specific hardware accelerators.System vendors provide Host Service Layer with virtual memory and cache coherence support

Intel QuickAssist QPI-Based FPGA Accelerator Platform (QAP)IBM POWER8’s Coherent Accelerator Processor Interface (CAPI)4CoreL2 $…

L3 $

Core

L2 $

Acc

Agent

Host Service Layer

Accelerator

Main CPU/

SoC

FPGA or user-defined ASICSlide5

Private L1/

ScratchpadAladdinAcceleratorSpecificDatapath

Shared Memory/Interconnect

Models

Unmodified

C-Code

Accelerator Design

Parameters

(e.g., # FU,

mem

. BW)

Power/Area

Performance

“Accelerator Simulator”

Design Accelerator-Rich

SoC

Fabrics and Memory Systems

Aladdin: A pre-RTL, Power-Performance Accelerator Simulator

5Slide6

Private L1/

ScratchpadAladdinAcceleratorSpecificDatapath

Shared Memory/Interconnect

Models

Unmodified

C-Code

Accelerator Design

Parameters

(e.g., # FU,

mem

. BW)

Power/Area

Performance

“Accelerator Simulator”

Design Accelerator-Rich

SoC

Fabrics and Memory Systems

Aladdin: A pre-RTL, Power-Performance Accelerator Simulator

6Slide7

Private L1/

ScratchpadAladdinAcceleratorSpecificDatapath

Shared Memory/Interconnect

Models

Unmodified

C-Code

Accelerator Design

Parameters

(e.g., # FU,

mem

. BW)

Power/Area

Performance

“Accelerator Simulator”

Design Accelerator-Rich

SoC

Fabrics and Memory Systems

Design Cost

Flexibility

Programmability

Aladdin: A pre-RTL, Power-Performance Accelerator Simulator

“Design Assistant”

Understand Algorithmic-HW

Design Space before RTL

7Slide8

Aladdin OverviewC Code

Power/AreaPerformanceActivityAcc Design Parameters

Optimization Phase

Realization Phase

Optimistic

Initial

DDDG

Idealistic

DDDG

Program

Constrained

DDDG

Resource

Constrained

DDDG

Power/Area

Models

Dynamic Data

Dependence Graph

(DDDG)Slide9

Aladdin Take-AwayCompared to HLS and hand-written RTL for SHOC benchmarks and custom accelerator designsLarge design space exploration (DSE) in minutes instead of hours/days with unmodified C/C++ algorithm description

LimitationsDynamic approach  Aladdin depends on realistic workload inputsAlgorithm dependent Aladdin enables DSE/algorithm exploration9Cycle Counts PowerAreawithin 2%

within 5%

within 7%Slide10

Aladdin enables pre-RTL simulation of accelerators with the rest of the SoC.

GPUShared ResourcesMemoryInterfaceSea of Fine-Grained Accelerators

Big

Cores

Small Cores

GPGPU-

Sim

gem5

em5

…

Ruby/GARNET

DRAMSim2

10Slide11

gem5-Aladdin Integration

CPUDMA EngineScratchpadTLB

DRAM

LLC

Cache

Acc

Datapath

11Slide12

gem5-Aladdin Integration

CPUDMA EngineDRAMLLCCache

Scratchpad

TLB

Cache

Acc

Datapath

Scratchpad

TLB

Cache

Acc

Datapath

Acc

Shared Cache

…

…Slide13

Acc

CacheMemory

CPU

Cache

Memory

13Slide14

Increasing number of accelerators are integrated into both mobile SoCs and servers.gem5-Aladdin integration enables rapid design space exploration of future accelerator-centric platforms. Download

Aladdin athttp://vlsiarch.eecs.harvard.edu/aladdin 14Heterogeneous SoC Modeling