Time Topic 900 am 930 am Introduction 930 am 1010 am Standalone Accelerator Simulation Aladdin 1010 am 1030 am Standalone Accelerator Generation HighLevel Synthesis ID: 205563
Download Presentation The PPT/PDF document "Tutorial Outline" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Tutorial Outline
TimeTopic9:00 am – 9:30 amIntroduction9:30 am – 10:10 amStandalone Accelerator Simulation: Aladdin10:10 am – 10:30 amStandalone Accelerator Generation: High-Level Synthesis10:30 am – 11:00 amHLS-Based Accelerator-Rich Architecture Simulation: PARADE11:00 am – 11:30 amBreak11:30 am – 12:00 pmPre-RTL SoC Simulation: gem5-Aladdin12:00 pm – 12:30 pmFPGA Prototyping: ARACompiler12:30 pm – 2:00 pmLunch2:00 pm – 3:00 pmPanel on Accelerator Research3:00 pm – 3:30 pmAccelerator Benchmarks and Workload Characterization3:30 pm – 4:00 pmBreak4:00 pm – 5:00 pmHands-on Exercise
1Slide2
Integration for Heterogeneous SoC Modeling
Yakun Sophia Shao, Sam Xi, Gu-Yeon Wei, David BrooksHarvard University2Slide3
Accelerator-CPU Integration:
Today’s Conventional SoCs3Easy to integrate lots of IP, simple accelerator designHard to program and share dataCoreL2 $…L3 $CoreL2 $DMAOn-Chip System BusAcc #1ScratchpadAcc #nScratchpadSlide4
Accelerator Integration Trend
Users design application-specific hardware accelerators.System vendors provide Host Service Layer with virtual memory and cache coherence supportIntel QuickAssist QPI-Based FPGA Accelerator Platform (QAP)IBM POWER8’s Coherent Accelerator Processor Interface (CAPI)4CoreL2 $…L3 $CoreL2 $AccAgentHost Service LayerAcceleratorMain CPU/SoCFPGA or user-defined ASICSlide5
Example of state-of-the-art:
IBM POWER8’s Coherent Accelerator Processor Interface (CAPI)Virtual Addressing & Data CachingEasier, Natural Programming Model5IBM CAPI: Two part solutionSlide6
Coherent Accelerator Processor Proxy (CAPP)
Snoops PowerBus on behalf of acceleratorPower Service Layer (PSL)Performs address translations, page table walker supportProvides cache and interface logicIBM CAPI: Two part solutionCoreCoreL2 $L2 $On-Chip Coherent PowerBusMemoryCAPPAccelerator…PCIePSLCacheTLB…L3 $6Slide7
But… accelerators are
not one size fits allProblem: PSL layer consumes ~20-30% of FPGA resources… for one acceleratorApplications have drastically different requirements.Memory design customization is often more important than datapath customization7Slide8
gem5-Aladdin Integration
CPUDMA EngineScratchpadTLBDRAMLLCCacheCacheAcc Datapath8Slide9
Code example: Sift
void imsmooth(F2D* array, float sigma, F2D* product);void sift() { … imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product)); invokeAcceleratorAndBlock(imsmooth); …}9Slide10
Code example: Sift
void imsmooth(F2D* array, float sigma, F2D* product);void sift() { … // imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product)); invokeAccelerator(imsmooth); …}Start Aladdin Simulation10Slide11
Simulating Accelerator with Memory System using Aladdin
11AccCacheMemorySlide12
Acc
CacheMemoryCPUCacheMemory12Slide13
Modeling Accelerators in an
SoC-like EnvironmentAccCoreCacheMemoryCore13Slide14
Acc
CoreCacheMemoryModeling Accelerators in an SoC-like Environment14Slide15
Aladdin
gem5-AladdinFPGAPrototypingModelingHigh-Level SynthesisPARADEAccelerator Research Infrastructure15StandaloneSystem IntegrationRTLSlide16
Tutorial References
Y.S. Shao and D. Brooks, “ISA-Independent Workload Characterization and its Implications for Specialized Architectures,” ISPASS’13.B. Reagen, Y.S. Shao, G.-Y. Wei, D. Brooks, “Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware,” ISLPED’13.Y.S. Shao, B. Reagen, G.-Y. Wei, D. Brooks, “Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures,” ISCA’14.B. Reagen, B. Adolf, Y.S. Shao, G.-Y. Wei, D. Brooks, “MachSuite: Benchmarks for Accelerator Design and Customized Architectures,” IISWC’14.16