S anketh Nalli Swapnil Haria Michael M Swift Mark D Hill H aris Volos Kimberly Keeton University of WisconsinMadison amp HewlettPackard Labs Facilitate better system support for Persistent ID: 581421
Download Presentation The PPT/PDF document "An Analysis of Persistent Memory Use wit..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
An Analysis of Persistent Memory Use with WHISPER
S
anketh Nalli, Swapnil Haria, Michael M. Swift, Mark D. Hill, Haris Volos*, Kimberly Keeton*University of Wisconsin-Madison &*Hewlett-Packard LabsSlide2
Facilitate better system support for Persistent
Memory (PM)Wisconsin-HP
Labs Suite for Persistence, a benchmark suite for PM4% accesses to PM, 96% accesses to DRAM5-50 epochs/tx, contributed by memory allocation & logging75% of epochs are small, update just one PM cacheline Re-referencing PM cachelines: Common in a thread,
rare
across threadsHands Off Persistence System (HOPS) optimizes PM transactions
2
WHISPER: research.cs.wisc.edu/multifacet/whisper
Executive SummarySlide3
WHISPER: Wisconsin-
HP Labs Suite for PersistenceWHISPER Analysis
HOPS : Hands-Off Persistence System3Outline➜Slide4
Persistent Memory is coming soon
PM = NVM attached to CPU on memory bus
Offers low latency reads and persistent writesAllows user-level, byte-addressable loads and stores4CPUCache hierarchy
Persistent Memory (NVM on memory bus)
Volatile Memory
CPUSlide5
What guarantees after failure ?
Durability = Data survives failure
Consistency = Data is usable5PM
Pointer
Data
C
ACHE
Pointer
Data
Pointer
1 .
Data
update followed by pointer update in cache
2.
Pointer
is evicted from cache to PM
3.
Data
lost on
failure
,
dangling pointer persistsSlide6
Achieving consistency & durability
Ordering = Useful building block of consistency mechanisms
Epoch = Set of writes to PM guaranteed to be durable before ANY subsequent writes become durableOrdering primitives: SFENCE on x86-646PM
Data
1
. Store data update in cache
2 . Flush data update to PM
3 .
Store pointer
update in cache
4
.
Flush
p
ointer
update to PM
Data
flush
Data
Pointer
Data
Pointer
flushSlide7
Native
Application-specific optimizationsPersisten
ce libraryAtomic allocations, transactionsPM-aware FilesystemsPOSIX interfacePersistent Memory (PM)
Application
NVML
Mnemosyne
load/
store
TX
load/store
TX
7
e
xt4-DAX
r
ead/write
VFS
PMFS
Approaches
to consistency
& durabilitySlide8
What’s the problem ?
Lack of standard workloads slows researchMicro-benchmarks not very representative
Partial understanding of how applications use PM8Slide9
WHISPER benchmark suite
Benchmark
Type
Brief
description (*Adapted to PM)
Echo
*
KV store
Scalable, multi-version k
ey
-value store
N-store*
Database
Fast, in-memory relational DB
Redis
NVML
R
e
mote Dictionary Service
C-tree
NVML
Microbenchmarks
for simulations
Hashmap
NVML
Microbenchmarks
for simulation
s
Vacation*
Mnemosyne
Online
travel reservation system
Memcached
*
Mnemosyne
In-memory
key-value store
NFS
PMFS
Linux
server/client for remote file access
Exim
PMFS
M
ail server
;
stores
mails
in p
er
-user
file
MySQL
PMFS
Widely used RDBMS for
OLTP
9Slide10
WHISPER: Wisconsin-
HP Labs Suite for PersistenceWHISPER Analysis
HOPS : Hands-Off Persistence System10Outline✔
➜Slide11
How many accesses to PM ?
11
Suggestion: Do not impede volatile accesses Slide12
How many epochs/transaction ?
Durability after every epoch impedes execution
Expectation: 3 epochs/TX = log + data + commitReality: 5 to 50 epochs/TXSuggestion: Enforce durability only
at
the end of a transaction12Slide13
What contributes to epochs ?
Log entriesUndo log
: Alternating epochs of log and dataRedo log: 1 Log epoch + 1 data epochPersistent memory allocation1 to 5 epochs per allocation13Suggestion: Use redo logs and reduce epochs
from memory allocatorSlide14
How large are
epochs?
Suggestion: Consider optimizing for small epochs Determines amount of state buffered per epochSmall epochs are abundant75% update single cachelineLarge epochs in PMFS
14
#
of 64B
cachelinesSlide15
What are epoch dependencies ?
A
BCD
1
2
3
Thread 1
Thread 2
Self-dependency
: B
D
Cross-dependency: 2
C
Why do they matter ?
Dependency can
stall
execution
Measured dependencies in
50
microsec
window
15Slide16
How common are dependencies ?
16
Suggestion: Design multi-versioned buffersOR avoid updating same cacheline across epochsSlide17
WHISPER: Wisconsin-
HP Labs Suite for PersistenceWHISPER Analysis
HOPS : Hands-Off Persistence System17Outline✔
✔
➜Slide18
ACID
Transactions
18TX_START
Prepare Log Entry
1
N
Mutate Data Structure
1
N
Commit Transaction
Persistent Writes
SFENCE
O
FENCE
ACID Transactions in HOPS
D
FENCE
TX_ENDSlide19
+
Stores19
CPUShared LLCCPUDRAM Controller
PM Controller
Persistent
Private L1
Private L1
Volatile
Persist Buffer
Front End
Persist Buffer
Front End
Persist Buffer
Back End
Loads + Stores
Loads
HOPS Base System
HOPS Persist buffersSlide20
WHISPER
HOPS4% accesses to PM, 96% to DRAM
20
5-50 epochs/transaction
Self-dependencies common
Cross-dependencies rare
Volatile memory hierarchy (almost)
unchanged by PBs
Order epochs without flushing
Allows multiple copies of same
cacheline
in PB via timestamps
Correct, conservative method
using coherence & timestampsSlide21
HOPS Evaluation with WHISPER
21Baseline,
CLWB+ SFENCEHOPS, OFENCE and DFENCE24%Slide22
Summary
Persistent Memory (PM) is coming soon
Progress is slowed by ad-hoc micro-benchmarksWe contributed WHISPER, open-source benchmark suiteHOPS design, based on WHISPER analysisWe hope for more similar analysis in the future !22research.cs.wisc.edu/
multifacet
/whisper/Slide23
23
ExtraSlide24
Summary
WHISPER:
Wisconsin-HP Labs Suite for Persistence4% accesses to PM, 96% accesses to DRAM5-50 epochs/TX, primarily small in sizeCross-dependencies rare, self-dependencies commonHOPS improves PM app performance by 24%
More results in ASPLOS’17 paper and code at:
24
research.cs.wisc.edu/multifacet/whisper/Slide25
A Simple Transaction using Epochs
25
TM_BEGIN();pobj.data = 42;pobj.init = True;TM_END
();
transaction_begin
:
log[
pobj
.init
] ←
True
log[
pobj
.data
] ← 42
write_back(log
)
wait_for_write_back()
pobj
.init
← True
pobj
.data
← 42
write_back(
pobj
)
wait_for_write_back()
transaction_end
;
Epoch 1
Log
entries
stored &
persisted
.
Epoch 2
Variables
stored &
persisted.
Slide26
Runtimes cause write amplification
PMFSMnemosyne
Logs every PM writePMFSNVMLClears logAuxiliary structures< 5% writes to PMNon-temporal writesMnemosyne logsPMFS user-data
26