Big Data Processing over Hybrid Memories Chenxi Wang Huimin Cui Ting Cao John Zigman Haris Volos Onur Mutlu Fang Lv Xiaobing Feng Guoqing Harry Xu Big Data Workloads MLlib Current ID: 934285
Download Presentation The PPT/PDF document "Panthera: Holistic Memory Management fo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Panthera: Holistic Memory Management for Big Data Processing over Hybrid Memories
Chenxi Wang
Huimin Cui
Ting Cao
John
Zigman
Haris Volos
Onur
Mutlu
Fang
Lv
Xiaobing Feng
Guoqing Harry Xu
Slide2Big Data
Workloads
MLlib
Current
Memory
: DRAM 40% of
total energy consumption
Capacity starts to hit limit
Written in managed languages
Slide3Non-Volatile Memory
(NVM)
Byte
addressable memory
material
ProsHigher
memory capacity
densityLower
price
Negligible background energy
consumption
ConsIncreased
read/write latencyReduced bandwidth
Slide4Hybrid Memory
:
DRAM +
Non-Volatile
Memory (NVM)
Cache
DRAM
SSD/HD
Cache
DRAM
SSD/HD
Non
Volatile
Memory
Current
Memory
Architecture
Hybrid
Memory
Architecture
Hot
Warm
Cold
Divide
and
place
data
into
Hybrid
Memory
Slide5Hybrid memory management for big dataOpportunities &
Challenges
Slide6Current
Solution
of
Hybrid
Memory Management
Divide
Java heap into
a DRAM and an NVM areas*
Profile
and
migrate the frequently accessed objects to
DRAM*[
*] Write-rationing garbage collection for hybrid memories,
Akram et al., PLDI’18
Young
Generation
Old
Generation
DRAM
NVM
Java
Heap
Frequently
accessed
object
Significant
online
profiling
overhead!
Slide7Data Characteristics
in
Big Data
Systems
Execution
Memory
Spark
Memory Management
Temporary
RDDs
Storage Memory
Frequently
used
RDDs
Off-Heap
Memory
Fault
Tolerance
RDDs
Spark
Distributed
data collection (RDD)
Different
RDDs
have
different
and
clear access
patterns
Application-level
memory
subsystem
Coarse-grained
granularity
Slide8Working with
Big
Data Characteristics
Use
the
characteristics of RDD
to do
coarse-grained data division
Objects
within one RDD have the
same access pattern and lifetime
=> Save lots
of profiling
overhead !
❌
Runtime cannot see these semantics
of RDDs
Java
objects
RDD
#1
RDD
#2
Runtime
Slide9Design
Slide10Panthera
-
Holistic
Memory
Management System
for
Big
Data Systems
DRAM
NVM
Runtime
(OpenJDK)
Spark
Applications
Physical
Memory
Data
profiling
:
Static
inference
Coarse-grained dynamic
analysis
Map
Java
heap
space
to
physical
NVM/DRAM
RDD
#2
RDD
#1
Slide11var links
=
ctx.textFile..persist()
for
(
i
<-
1 to
iters
){
.....
var contribs
=
links.join(..)..persist()
.....
}
Static
Inference
of
RDD
Memory
Tags
Infer
the
access
frequency
of
RDD
by
def-use
analysis
Mark Hot RDD
with DRAM tagMark Cold
RDD
with
NVM
tag
PageRank
Data
intensive
applications
Manage
data
in
coarse-grained
manner
i.e., RDDData
access pattern and lifetime are statically
observed
DRAMNVM
Slide12Pass DRAM/NVM
Tags
via GC
Java
Object
with
DRAM
tag
RDD
root
object
It’s
not
practical
for
a
static
analysis
to
find
and
mark
all
the
objects
within
a
RDD
GC
traces
all
alive
objects
from
root
Java
Object
Utilize
GC
to
propagate
the
DRAM/NVM
tags
from
RDD
root
objects
to
all
reachable
data
objects
Zero
online
profiling
overhead!
Slide13Dynamic Profiling
of
RDD Method
Invocations
Low-overhead Dynamic Profiling mechanism
Monitor the
number of function invocations on RDD
root objects
Update the tags of
the RDD root objects and propagate
them to other objects during
major GC
Slide14Data Placement
in
Panthera
Based on
DRAM/NVM tags
Java Heap
Young Generation
DRAM
NVM
Old
Generation
GC
Slide15Runtime Optimizations
Utilize
application-level semantics to
do runtime optimizations
Eager promotion of RDD
data objectsBig array
optimization: alignment padding
Slide16Evaluation
Slide17Our Hybrid
Memory
Emulator Supports Big Data Apps
Local DRAM
Remote DRAM
Host CPU
Remote CPU
Run Big Data applications
DRAM
Emulated NVM
Hybrid Memory Emulator
*
(
QPI
)
Thermal
Register
DRAM
NVM
Latency (ns)
120
300
Bandwidth (GB/s)
30
10
QuickPath Interconnect
*
Quartz: A Lightweight Performance Emulator for Persistent Memory Software,
Volos
et
al.,
Middleware’15
Slide18Experiment Setup
Baseline
DRAM Only
[*] Write-rationing garbage collection for hybrid memories
, Akram et al., PLDI’18
Comparisons
PantheraUnmanaged
Young generation: mapped
on DRAM
Old generation: mapped
on DRAM and
NVM interleavedAt least
as good
as Write
Rationing GC*
DRAM
NVM
NVM
DRAM
Old
generation:
DRAM
and
NVM
Inter
leave
at
a
specific
ratio
(i.e.,
1/3
DRAM
ratio)
Slide1964
GB heap,
DRAM/Memory =
1/3, Panthera
has only
4% performance overhead
Overall
Results
–
Performance Overhead
Average
1.21
1.04
Slide2064 GB
heap, DRAM/Memory = 1/3, Panthera saves
32% energy consumption
Overall
Results –
Energy
Consumption
Average
0.73
Slide2164
GB heap,
DRAM/Memory = 1/3, Panthera has 1% GC
performance overhead
Unmanaged has
59% GC
Performance
overhead
GC Performance
Average
Slide2264
GB heap,
DRAM/Memory = 1/3, Panthera has 3% Mutator
performance overhead
Unmanaged has
6% Mutator
Performance
overhead
Mutator
(Computation) Performance
Average
Slide23Conclusions
Panthera: a holistic
memory management system for hybrid memory
Designed
for big data applications
Use application-level semantics
for coarse-grained data placement
Reduce energy consumption significantly
at a small time
cost
Slide24Thanks!Q & A