/
Tutorial Outline Tutorial Outline

Tutorial Outline - PowerPoint Presentation

stefany-barnette
stefany-barnette . @stefany-barnette
Follow
426 views
Uploaded On 2015-11-26

Tutorial Outline - PPT Presentation

Time Topic 900 am 930 am Introduction 930 am 1010 am Standalone Accelerator Simulation Aladdin 1010 am 1030 am Standalone Accelerator Generation HighLevel Synthesis ID: 205563

imsmooth accelerator memory cache accelerator imsmooth cache memory core void acc aladdin product shao soc integration system fpga f2d ibm layer coherent

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Tutorial Outline" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Tutorial Outline

TimeTopic9:00 am – 9:30 amIntroduction9:30 am – 10:10 amStandalone Accelerator Simulation: Aladdin10:10 am – 10:30 amStandalone Accelerator Generation: High-Level Synthesis10:30 am – 11:00 amHLS-Based Accelerator-Rich Architecture Simulation: PARADE11:00 am – 11:30 amBreak11:30 am – 12:00 pmPre-RTL SoC Simulation: gem5-Aladdin12:00 pm – 12:30 pmFPGA Prototyping: ARACompiler12:30 pm – 2:00 pmLunch2:00 pm – 3:00 pmPanel on Accelerator Research3:00 pm – 3:30 pmAccelerator Benchmarks and Workload Characterization3:30 pm – 4:00 pmBreak4:00 pm – 5:00 pmHands-on Exercise

1Slide2

Integration for Heterogeneous SoC Modeling

Yakun Sophia Shao, Sam Xi, Gu-Yeon Wei, David BrooksHarvard University2Slide3

Accelerator-CPU Integration:

Today’s Conventional SoCs3Easy to integrate lots of IP, simple accelerator designHard to program and share dataCoreL2 $…L3 $CoreL2 $DMAOn-Chip System BusAcc #1ScratchpadAcc #nScratchpadSlide4

Accelerator Integration Trend

Users design application-specific hardware accelerators.System vendors provide Host Service Layer with virtual memory and cache coherence supportIntel QuickAssist QPI-Based FPGA Accelerator Platform (QAP)IBM POWER8’s Coherent Accelerator Processor Interface (CAPI)4CoreL2 $…L3 $CoreL2 $AccAgentHost Service LayerAcceleratorMain CPU/SoCFPGA or user-defined ASICSlide5

Example of state-of-the-art:

IBM POWER8’s Coherent Accelerator Processor Interface (CAPI)Virtual Addressing & Data CachingEasier, Natural Programming Model5IBM CAPI: Two part solutionSlide6

Coherent Accelerator Processor Proxy (CAPP)

Snoops PowerBus on behalf of acceleratorPower Service Layer (PSL)Performs address translations, page table walker supportProvides cache and interface logicIBM CAPI: Two part solutionCoreCoreL2 $L2 $On-Chip Coherent PowerBusMemoryCAPPAccelerator…PCIePSLCacheTLB…L3 $6Slide7

But… accelerators are

not one size fits allProblem: PSL layer consumes ~20-30% of FPGA resources… for one acceleratorApplications have drastically different requirements.Memory design customization is often more important than datapath customization7Slide8

gem5-Aladdin Integration

CPUDMA EngineScratchpadTLBDRAMLLCCacheCacheAcc Datapath8Slide9

Code example: Sift

void imsmooth(F2D* array, float sigma, F2D* product);void sift() { … imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product)); invokeAcceleratorAndBlock(imsmooth); …}9Slide10

Code example: Sift

void imsmooth(F2D* array, float sigma, F2D* product);void sift() { … // imsmooth(I, temp, gss[0]); mapArrayToAccelerator(imsmooth, “array”, (void *)I, sizeof(I)); mapArrayToAccelerator(imsmooth, “product”, (void *)product, sizeof(product)); invokeAccelerator(imsmooth); …}Start Aladdin Simulation10Slide11

Simulating Accelerator with Memory System using Aladdin

11AccCacheMemorySlide12

Acc

CacheMemoryCPUCacheMemory12Slide13

Modeling Accelerators in an

SoC-like EnvironmentAccCoreCacheMemoryCore13Slide14

Acc

CoreCacheMemoryModeling Accelerators in an SoC-like Environment14Slide15

Aladdin

gem5-AladdinFPGAPrototypingModelingHigh-Level SynthesisPARADEAccelerator Research Infrastructure15StandaloneSystem IntegrationRTLSlide16

Tutorial References

Y.S. Shao and D. Brooks, “ISA-Independent Workload Characterization and its Implications for Specialized Architectures,” ISPASS’13.B. Reagen, Y.S. Shao, G.-Y. Wei, D. Brooks, “Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware,” ISLPED’13.Y.S. Shao, B. Reagen, G.-Y. Wei, D. Brooks, “Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures,” ISCA’14.B. Reagen, B. Adolf, Y.S. Shao, G.-Y. Wei, D. Brooks, “MachSuite: Benchmarks for Accelerator Design and Customized Architectures,” IISWC’14.16