Store Application on Heterogeneous CPUGPU Systems Tayler H Hetherington ɣ Timothy G Rogers ɣ Lisa Hsu Mike OConnor Tor M Aamodt ɣ ɣ UBC AMD University of British Columbia ID: 597526
Download Presentation The PPT/PDF document "Characterizing and Evaluating a Key-valu..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Characterizing and Evaluating a Key-value
Store Application on Heterogeneous CPU-GPU Systems
Tayler H. HetheringtonɣTimothy G. RogersɣLisa Hsu*Mike O’Connor*Tor M. AamodtɣɣUBC *AMD
University of British ColumbiaIn Proc. 2012 ACM/IEEE Int’l Symp. On Performance Analysis of Systems and Software (ISPASS)
Rich Miler – www.datacenterknowledge.comSlide2
Server farms require a lot of power
Need for efficient, cost-effective solutions
GPU/APUs
New types of workloads
Non-HPC
Server applications
Server applications
Memcached
Programmer’s initial intuition into an application’s behavior
Motivation
Tayler Hetherington, Timothy Rogers,
Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached
Key-value Store on
GPU/APU
2
Bruno
Giussani
– ww.wired.comSlide3
Background Memcached
3Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on
GPU/APU*Slide from HPCA-18, 2012 Facebook Keynote, Sanjeev KumarSlide4
Memcached - Compatible
with GPU?Irregular control flow Irregular memory access patterns Large memory
requirementsHighly input data dependent4Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APUSlide5
Porting MemcachedSimple key-value lookup
5Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached
Key-value Store on GPU/APUGETServer2
Hash
Memory
Key Comparison
Return Hit/Miss
Hash chaining
Miss
Hit
READ
(GET)
requests on GPU
WRITE
(SET) requests on CPUSlide6
GET
Hash
Memory
Key Comparison
Return Hit/Miss
Hash chaining
Miss
Hit
GET
Server
2
Hash
Memory
Key Comparison
Return Hit/Miss
Hash chaining
Miss
Hit
GET
Server
2
Hash
Memory
Key Comparison
Return Hit/Miss
Hash chaining
Miss
Hit
Porting Memcached - Batching
6
Tayler Hetherington, Timothy Rogers,
Lisa Hsu, Mike O'Connor, Tor M.
Aamodt
Memcached
Key-value Store on
GPU/APU
GET
Hash
Memory
Key Comparison
Return Hit/Miss
Hash chaining
Miss
Hit
Server
n
GET
Server
2
Hash
Memory
Key Comparison
Return Hit/Miss
Hash chaining
Miss
HitSlide7
Porting Memcached7
Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU
Main GoalsIncrease request throughputKeep request latency reasonable Main ChallengesIrregular memory access patternsIrregular control flowData transfer overheadsSlide8
MethodologyHardwareAMD Radeon HD 5870 (Discrete)
AMD Llano A8-3850 (Fusion)AMD Zacate E-350 (Fusion)SimulatorsGPGPU-Sim v3.x In-house GPU control flow simulatorTesting and Simulation
Traces of Wikipedia accesses8Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APUSlide9
Porting MemcachedMemory Access
One request per work itemData accesses for GET requests are input data dependentData can be anywhere in memory
Poor performance on GPU?9Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APUSlide10
Porting
MemcachedMemory Divergence
10Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APUSlide11
Porting MemcachedControl Flow
Recall the control flow graph Many branch outcomes are
input data dependent 11
Work item ID
1 – 2 – 3 – 4 – 5
1 – 2 – 5
3 – 4
1 – 5
23 – 4
Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APUSlide12
Porting Memcached
Control Flow12
Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU15%40%62%29%
Overall51%Slide13
Porting MemcachedData Management
13Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU
Dynamic memory managerTransfer memory regions to deviceVirtual addresses different on host and deviceSlide14
Porting MemcachedData Transfer Reduction
Fusion SystemsPhysical shared memory region between host and deviceZero-copy dataDiscrete SystemsPossible transfer reduction techniques
Reduction in unnecessary transfersAcyclic data transfers (Overlap comm. with comp.)Automatic data transfer frameworks14Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APUSlide15
Porting Memcached
15
Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APUSlide16
Results
Radeon HD 5870 16
Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APU~8000 requests yields highest ratio of throughput to latencySlide17
Summary
Programmer intuition doesn’t always paint the whole picture
We exploited the available parallelism on GPUs by batching requests, showing a 7.5X performance increase on the Llano systemData transfer overheads can have a large impact on overall performanceThank you – Questions?17Tayler Hetherington, Timothy Rogers, Lisa Hsu, Mike O'Connor, Tor M. Aamodt Memcached Key-value Store on GPU/APURich Miler – www.datacenterknowledge.com