Enabling ProgrammerTransparent NearData Processing in GPU Systems Kevin Hsieh Eiman Ebrahimi Gwangsun Kim Niladrish Chatterjee Mike OConnor Nandita Vijaykumar Onur ID: 1001718
Download Presentation The PPT/PDF document "Transparent Offloading and Mapping (TOM)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Transparent Offloading and Mapping (TOM)Enabling Programmer-Transparent Near-Data Processing in GPU SystemsKevin HsiehEiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O’Connor, Nandita Vijaykumar, Onur Mutlu, Stephen W. Keckler
2. Opportunity2Logic layer SMCrossbar switchVault Ctrl….Vault CtrlLogic layerSM (Streaming Multiprocessor)Main GPU3D-stacked memory(memory stack)Processing data directly in 3D-stacked memories is a promising direction
3. The Problem3Logic layer SMCrossbar switchVault Ctrl….Vault CtrlLogic layerMain GPU3D-stacked memory(memory stack)However, it requires significant programmer effortSM (Streaming Multiprocessor)
4. Key Challenge 14Logic layer SMCrossbar switchVault Ctrl….Vault CtrlLogic layerMain GPU3D-stacked memory(memory stack)SM (Streaming Multiprocessor)
5. Key Challenge 15Logic layer SMCrossbar switchVault Ctrl….Vault CtrlLogic layer?Main GPU3D-stacked memory(memory stack)Challenge 1: Which operations should be executed on the logic layer SMs??SM (Streaming Multiprocessor)
6. Key Challenge 26Logic layer SMCrossbar switchVault Ctrl….Vault CtrlLogic layerMain GPU3D-stacked memory(memory stack)Challenge 2: How should data be mapped to different 3D memory stacks? SM (Streaming Multiprocessor)
7. Our Approach: TOMComponent 1: A new programmer-transparent mechanism to identify and decide what code portions to offload7
8. Our Approach: TOMComponent 1: A new programmer-transparent mechanism to identify and decide what code portions to offloadThe compiler identifies code portions to potentially offload based on memory profile.8
9. Our Approach: TOMComponent 1: A new programmer-transparent mechanism to identify and decide what code portions to offloadThe compiler identifies code portions to potentially offload based on memory profile.The runtime system decides whether or not to offload each code portion based on runtime characteristics.9
10. Our Approach: TOMComponent 1: A new programmer-transparent mechanism to identify and decide what code portions to offloadThe compiler identifies code portions to potentially offload based on memory profile.The runtime system decides whether or not to offload each code portion based on runtime characteristics.Component 2: A new, simple, programmer-transparent data mapping mechanism to maximize code/data co-location10
11. Our Approach: TOMComponent 1: A new programmer-transparent mechanism to identify and decide what code portions to offloadThe compiler identifies code portions to potentially offload based on memory profile.The runtime system decides whether or not to offload each code portion based on runtime characteristics.Component 2: A new, simple, programmer-transparent data mapping mechanism to maximize code/data co-locationKey Results: 30% average (76% max) performance improvement in GPU workloads11
12. Transparent Offloading and Mapping (TOM)Kevin HsiehEiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O’Connor, Nandita Vijaykumar, Onur Mutlu, Stephen W. Keckler Talk at Monday 2:50pm (Session 3B)