Automation from Concept to Prototyping David Brooks Jason Cong Zhenman Fang Yakun Sophia Shao and Sam Xi Harvard University amp UCLA Tutorial Outline Time Topic Speaker 830 am 900 am ID: 524272
Download Presentation The PPT/PDF document "Rapid Exploration of Accelerator-Rich Ar..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Rapid Exploration of Accelerator-Rich Architectures: Automation from Concept to Prototyping
David Brooks, Jason Cong,
Zhenman
Fang,
Yakun Sophia Shao
, and Sam Xi
Harvard
University & UCLASlide2
Tutorial OutlineTime
Topic
Speaker
8:30 am – 9:00 am
Accelerator Research Infrastructure Overview
Sophia Shao
9:00
am – 9:30 am
Aladdin: Accelerator Pre-RTL Modeling
Sophia Shao
9:30
am – 10:00 am
Rapid Hardware Specialization with HLS: Glass Half Full
Prof.
Zhiru
Zhang
10:00 am – 10:30 am
PARADE: HLS-Based Accelerator-Rich Architecture Simulation
Zhenman
Fang
10:30
am – 11:00 am
Break
11:00
am – 11:30 am
gem5-Aladdin: Accelerator System Co-Design
Sam Xi
11:30
am – 12:00 pm
ARAPrototyper
: FPGA Prototyping
Zhenman
Fang
12:00pm – 13:30 pm
Lunch
13:30
pm – 14:00 pm
Virtual
Machine Setup
Sophia Shao & Sam Xi
14:00 pm
– 14:30 pm
Hands-on: Accelerator Design Space Exploration using Aladdin
Sophia Shao
14:30 pm – 15:00 pm
Hands-on: SoC Design Space Exploration using gem5-Aladdin
Sam XiSlide3
Moore’s Law
3Slide4
CMOS Scaling is Slowing Down
http://
www.anandtech.com
/show/9447/intel-10nm-and-kaby-lake
4
180 nm
130 nm
90 nm
65 nm
45 nm
32 nm
22 nm
14 nm
10 nmSlide5
CMOS Technology Scaling
Technological
Fallow Period
5Slide6
Potential for Specialized Architectures[Zhang and Brodersen]
16
Encryption
17
Hearing Aid
18
FIR for disk read
19
MPEG Encoder
20
802.11 Baseband
6Slide7
Cores, GPUs, and Accelerators:Apple A8 SoC
Out-of-Core
Accelerators
7Slide8
Cores, GPUs, and Accelerators:Apple A8 SoC
Out-of-Core
Accelerators
8Slide9
Cores, GPUs, and Accelerators:Apple A8 SoC
Out-of-Core
Accelerators
Maltiel
Consulting
estimates
Our estimates
9Slide10
Challenges in AcceleratorsFlexibilityFixed-function accelerators are only designed for the target applications.ProgrammabilityToday’s accelerators are explicitly managed by programmers.
10Slide11
OMAP 4 SoC
Today’s
SoC
11Slide12
OMAP 4 SoC
Today’s
SoC
ARM Cores
GPU
DSP
DSP
System Bus
Secondary
Bus
Secondary
Bus
Tertiary
Bus
DMA
DMA
SD
USB
Audio
Video
Face
Imaging
USB
12Slide13
Challenges in AcceleratorsFlexibilityFixed-function accelerators are only designed for the target applications.ProgrammabilityToday’s accelerators are explicitly managed by programmers. Design CostAccelerator (and RTL) implementation is inherently tedious and time-consuming.
13Slide14
Today’s SoC
GPU/DSP
CPU
Buses
Mem
Inter-
face
Acc
CPU
Acc
Acc
Acc
Acc
Acc
Acc
Acc
Acc
14Slide15
Future Accelerator-Centric ArchitecturesFlexibilityDesign Cost
Programmability
How to decompose applications into accelerators?
How to rapidly design lots of accelerators?
How to design and manage the shared resources?
GPU/DSP
Big
Cores
Shared Resources
Memory
Interface
Sea of Fine-Grained
Accelerators
Small Cores
15Slide16
PARADE: Platform for Accelerator-Rich
A
rchitectural
D
esign &
E
xploration
[ICCAD 15]
extended
gem5
(
McPAT
)
for X86 CPU, with OS
auto-generated accelerators based on HLS (
AutoPilot
)
added SPM, DMA,
GAM & TLB model
extended
Ruby
(CACTI) for
coherent cache hierarchy
gem5 memory
model [ISPASS 14]
extended
Garnet
(DSENT) for
NoCSlide17
Using Xilinx Zynq SoC (FPGA fabrics + ARM)
Major components of an ARA
General processor cores
A sea of heterogeneous accelerators
Memory system + interconnects (
NoC
)
ARAPrototyper
: Prototyping an ARA on FPGASlide18
GPU/DSP
Big Cores
Shared Resources
Memory
Interface
Sea of Fine-Grained
Accelerators
Small Cores
18
g
em5-Aladdin: Accelerator-System
Co-Design
[MICRO’16]
Contributions
Accelerator Design w/
High-Level Synthesis
[ISLPED’13_1]
Aladdin:
A
ccelerator Pre-RTL, Power-Performance
Simulator
[ISCA’14, TopPicks’15]
MachSuite
: Accelerator
Benchmark
Suite [IISWC’14]
WIICA:
Accelerator
Workload
Characterization
[ISPASS’13]Slide19
Tutorial OutlineTime
Topic
Speaker
8:30 am – 9:00 am
Accelerator Research Infrastructure Overview
Sophia Shao
9:00
am – 9:30 am
Aladdin: Accelerator Pre-RTL Modeling
Sophia Shao
9:30
am – 10:00 am
Rapid Hardware Specialization with HLS: Glass Half Full
Prof.
Zhiru
Zhang
10:00 am – 10:30 am
PARADE: HLS-Based Accelerator-Rich Architecture Simulation
Zhenman
Fang
10:30
am – 11:00 am
Break
11:00
am – 11:30 am
gem5-Aladdin: Accelerator System Co-Design
Sam Xi
11:30
am – 12:00 pm
ARAPrototyper
: FPGA Prototyping
Zhenman
Fang
12:00pm – 13:30 pm
Lunch
13:30
pm – 14:00 pm
Virtual
Machine Setup
Sophia Shao & Sam Xi
14:00 pm
– 14:30 pm
Hands-on: Accelerator Design Space Exploration using Aladdin
Sophia Shao
14:30 pm – 15:00 pm
Hands-on: SoC Design Space Exploration using gem5-Aladdin
Sam Xi