Rakan Maddah Seyed Mohammad Seyedzadeh and Rami Melhem Computer Science Department University of Pittsburgh HPCA 2015 Introduction DRAM and NAND Flash are facing physical limitations putting their scalability into question ID: 478903
Download Presentation The PPT/PDF document "CAFO: Cost Aware Flip Optimization for A..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CAFO: Cost Aware Flip Optimization for Asymmetric Memories
Rakan Maddah*, Seyed Mohammad Seyedzadeh and Rami MelhemComputer Science Department University of Pittsburgh
HPCA 2015Slide2
Introduction
DRAM and NAND Flash are facing physical limitations putting their scalability into questionDRAM: Decrease in cell reliability and Increase in power consumptionNAND Flash: Endurance degradation and Increase in number of transient and hard errorsPhase-Change Memory (PCM) and Spin-Transfer Torque Random Access Memory (STT-RAM)are a promising alternativeScalability, low access latency and close to zero leakage powerInitial assessments and evaluations are encouragingSlide3
Challenges
PCM and STT-RAM have a number of challenges that needs to be dealt with before deployment in functional systemsPCM suffers from limited endurance STT-RAM suffers from high write bit error rateSolution: Bit flip minimizationService write requests while flipping as few bits as possiblePreserves PCM’s endurance and improves STT-RAM’s write reliabilitySlide4
Previous Work
Differential Write: compares old data against new data and then only flips differing cells.
Flip-N-Write: encodes
write data into either its regular or inverted
form and then picks the encoding that yields in less flips in comparison against old data
Flip-Min:
encodes write data into
a set of data vectors
and then picks the
vector that
yields in less flips in comparison
against old
data
0
0
111
01001
Old
New
Saves 2 bit flips
0
0
111
01001
Old
New
10110
New
Saves 3 bit flips
0
0
1
1
1
01001
Old
New1
10110
New2
Saves 4 bit flips
10111
New
3Slide5
Write Asymmetries
PCMThe RESET state is more detrimental to endurance than the set state
STT-RAM
Anti-parallel magnetization is more prone to write errors than parallel magnetization
SET (“1”)
RESET (“0”)
Time
Power
Free Layer
Oxide
Layer
Reference Layer
Free Layer
Oxide
Layer
Reference Layer
Parallel magnetization (“0”)
Anti-parallel magnetization (“1”)Slide6
Contribution
Observation: existing schemes fail to exploit the write asymmetry
0
0
0
1
1
1
1
1
0
0
0
0
Saves 1
bit
flipOld
New
New
Saves 3 bit
flips
Writing a “0” is 4 times more detrimental to endurance than writing a“1”
Number of bit flips is oblivious to the write asymmetry!Slide7
Contribution
Observation: existing schemes fail to exploit the write asymmetryFocusing solely on the number of bit flips is oblivious to the write asymmetryProposal: move from the concept of “bit flip reduction” to “cost reduction”Cost Aware Flip Optimization (CAFO)Cost model: captures the write asymmetry and assigns a cost for a given write operationCoding engine: encodes the write data into a form that result in overall cost reductionSlide8
Cost Model
Compare write data to currently stored data and associate a cost to each cellThe costs “a”, “b”, “c” and “d” depend on the technology being modeled and the optimization objective (endurance, energy, error rate)
0
0
1
1
0
1
1
1
1
0
1
0
1
010
acdb
ab
d
b
Currently Stored DataNew Data
Cost of Writing
a: 01, b: 10, c: 00, d:11Write cost:
With a write cost we can define a gain among different encodingsSlide9
Gain Calculation
C= 2a + 3b + 1c + 2d = 8
C
encoded
= 1a + 2b + 2c + 3d = 5
Gain
G = C-
C
encoded
= 8 – 5 = 3
0
0
1
10111
1010101
0
Currently Stored Data
New Data
a
c
dbabdb
Cost of WritingCost of Writing cba
d
cd
adEncoded Dataa: 01, b: 10, c: 00, d:11Costs: a =
1, b = 2, c = 0, d = 0
0
1
01
01
01
A positive gain implies that it is less costly to write the data encodedHow to encode Data?Slide10
Encoding
Auxiliary bits
Auxiliary bits serve as inversion
flags
Coding
steps:
Compute rows gain
Flip all rows with positive gainSlide11
Encoding
Auxiliary bits serve as inversion
flags
Coding
steps:
Compute rows gain
Flip all rows with positive gain
Compute columns gain
Flip all columns with positive gain
Repeat process until all rows and columns show a zero or negative
gain
Alteration between row and column flips yields
in additional cost reductionSlide12
Encoding example
Costs: a =
1, b =
1,
c = 0, d =
0—”1” represents a cell that is to be flipped, “0” otherwise
1
0
0
1
0
1
1
0
1
1100100
011000110
1
10
11
00
101
11001110
10100010
01
00
1111001
01
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+2
0
-2
+2
GainSlide13
Encoding example
Costs: a =
1, b =
1,
c = 0, d =
0—”1” represents a cell that is to be flipped, “0” otherwise
1
0
0
1
0
1
1
0
1
11
001000110
0
01
10
11
011
00101110
01110101
00
0
100100111
1
00
10
1
0
0
0
00
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+2
0
-2
+2
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
-2
0
-2
-2
-2 +4 -2 -2 -2 +2 0 -4
1
0
0
1
0
1
1
0
1
1
1
0
0
1
0
0
0
1
1
0
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
0
1
1
0
1
1
0
1
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
1
1
0
1
0
Flip rows with + gainSlide14
Encoding example
Costs: a = 1, b = 1, c = 0, d =
0—”1” represents a cell that is to be flipped, “0” otherwise
0
0
0
0
1
0
0
1
0
1
0
0
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
0
0
0
0
00
01
00
11100101
000
0
000001010
0
1
0
0
0
0
0
0
0
0
1
1
0
1
0
1
0
1
1
1
1
0
0
-4
0
-4
-6
-4
-2
+2
-2 -4 -2 -2 -2 -2 0 -4
Flip columns with + gain
1
0
0
1
0
1
1
0
1
1
1
0
0
1
0
0
0
1
1
0
0
0
1
1
0
1
1
0
1
1
0
0
1
0
1
1
1
0
0
1
1
1
0
1
0
1
0
0
0
1
0
0
1
0
0
1
1
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
+2
0
-2
+2
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
-2
0
-2
-2
-2 +4 -2 -2 -2 +2 0 -4
1
0
0
1
0
1
1
0
1
1
1
0
0
1
0
0
0
1
1000110110110001000110110101000100100100011010
Flip rows with + gainSlide15
Encoding example
Costs: a =
1, b =
1,
c = 0, d
=
0—”1”
represents a cell that is to be flipped
, “0”
otherwise
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
-6
0
-6
-2
0
-2
-4
-2
0 -6 -0 -6 -4 -4 -2 -2
1
1
0
1
0
0
1
0
1
0
1
0
0
0
0
0
0
0
1
0
0
1
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
1
0
1
1
0
1
0
0
0
0
1
21 flips
33 flips
0
0
0
0
1
0
0
1
0
1
0
0
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
0
0
0
0
0
0
0
1
0
0
1
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
1
0
1
0
1
0
1
1
1
100-40-4-6-4-2+2
-2 -4 -2 -2 -2 -2 0 -4
1
0
0
1
0
1
1
0
1
1
1
0
0
1
0
0
0
1
1
000110110
110010111001110101000100100111100101
00000000
0
00000000000+20-2
+2
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
-2
0
-2
-2
-2 +4 -2 -2 -2 +2 0 -4
1
0
0
1
0
11011100100011000110110110001000110110101000100100100011010
Encoding terminates as no row or column shows a positive gain
Flip columns with + gain
Flip rows with + gain
Flip rows with + gainSlide16
Row only Inversion
100
1
0
1
1
0
1
1
1
0
0
1
0
0
011000110
1101100
1
0
1
1
1
0011101
0100010
01
00
1111001
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
0
1
0
FNW
1
0
0
1
0
1
1
0
0
0
0
1
0
1
0
0
0
1
1
0
0
0
1
1
0
1
1
0
1
1
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
1
0
1
0
1
33 flips
25 flipsSlide17
Encoding example
Costs: a =
1, b =
1,
c = 0, d
=
0—”1”
represents a cell that is to be flipped
, “0”
otherwise
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
-6
0
-6
-2
0
-2
-4
-2
0 -6 -0 -6 -4 -4 -2 -2
1
1
0
1
0
0
1
0
1
0
1
0
0
0
0
0
0
0
1
0
0
1
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
1
0
1
1
0
1
0
0
0
0
1
21 flips
33 flips
0
0
0
0
1
0
0
1
0
1
0
0
0
1
0
0
1
1
0
1
0
0
1
0
1
0
1
0
0
0
0
0
0
0
1
0
0
1
1
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
0
0
0
1
1
0
1
0
1
0
1
1
1
100-40-4-6-4-2+2
-2 -4 -2 -2 -2 -2 0 -4
1
0
0
1
0
1
1
0
1
1
1
0
0
1
0
0
0
1
1
00011011011001
011100111010100010010011110010100
00000000
0
000000000+20-2+2
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
-2
0
-2
-2
-2 +4 -2 -2 -2 +2 0 -4
1
0
0
1
0
11
011100100011000110110110001000110110101000100100100011010Flip columns with + gain
Flip rows with + gain
Flip rows with + gain
Can We do better?Slide18
Encoding Optimization
Write cost can be further reduced even if no row or column shows a positive gain0
1
0
0
0
0
1
1
1
0
0
0
1
00
00
00
0
0
0
0
0
-2
0
-2
-2
0
1
0
0
1
0
0
0
0
-2
-2
-2
0
-2
-4
-4
-2
0
-4
-4
1
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
5 flips
3 flips
Flip row and column together
GainSlide19
Encoding Optimization
Write cost can be further reduced even if no row or column shows a positive gain
Flipping both a row and a column, leaves their intersecting cell
un-inverted
The local gain of the intersecting cell has to subtracted from the total gain of the corresponding row and
columns
Gain is achieved if G
r
+
G
c
– 2g
r+c
> 0
Gc
Gr gr+c
010000
11
1
00
01
000
00
0
0
0
0
0
0
-2
0
-2
-2
0
1
0
0
1
0
0
0
0
-2
-2
-2
0
-2
-4
-4
-2
0
-4
-4
1
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
Flip row and column together
GainSlide20
Encoding Optimization (cont.)
Generalize to Flipping 1 column with multiple rows (Vice Versa)0
0
0
0
0
0
1
1
1
0
0
1
1
10
00
00
0
0
0
0
0
-4
0
0
0
0
1
0
0
1
0
0
00
0101000
0
1
1
0
0
1
0
0
Gain
-2
-2
-2
-1
0
-2
-2
0
0
-2
-2
0
6 flips
4 flips
Flip 2 rows and 2column togetherSlide21
Aux. Bits Cost
The cost of updating the auxiliary bits can be easily incorporated in the gain calculation0
0
1
1
0
1
1
0
1
1
0
1
0
1
0
10a: 01, b: 10, c: 00, d:11
a
c
db
abd
c
bc
abdcdba
d
C = 2a + 3b + 2c +d = 8
Cinverted = 2a + 2b + 2c +2d = 6G= C – Cinverted = 8 - 6 = 2
Cost of
Writing
GainCurrently Stored Data
New Data
Cost of Writing
Inverted DataCosts: a = 1, b = 2, c = 0, d = 0
Old aux bit has to be flipped to “0”Old aux bit stays the same
0101010
1Slide22
Decoding
Simple: XOR the corresponding vertical and horizontal aux bits
Output of “1”: read cell value inverted
Output of “0”: read cell valued un-inverted
0
0
0
0
0
0
1
1
1
0
0
111
000
0
0
0
0
0
0
0
0
1
0
0
1
0
00
00
10
10000
11
0
0
1
0
0
0
0
0
0
0
0
1
1
1
0
0
1
1
1
0
0
Encode
DecodeSlide23
Decoding
Simple: XOR the corresponding vertical and horizontal aux bits
Output of “1”: read cell value inverted
Output of “0”: read cell valued un-inverted
0
0
0
0
0
0
1
1
1
0
0
1110
00
0
0
0
0
0
0
0
0
1
0
0
1
0
00
00
101
0000
11
0
0
1
0
0
0
0
0
0
0
0
1
1
1
0
0
1
1
1
0
0
Encode
DecodeSlide24
Evaluation
Compare Against Flip-Min and Flip-N-Write (FNW)Experiment with various block sizes of matching space overheadCompute average cost reduction achieved by every scheme relative to differential writeExperiment with random input stream and memory traces collected from various SPEC benchmark programsModel both PCM and STT-RAM through setting the cost labels to match the underlying technologySlide25
Cost Reduction vs. Cost oblivious FNW and Flip-Min
Overhead: 3.125%
Overhead: 12.5%
Overhead: 6.25%Slide26
Cost Reduction vs. Cost oblivious FNW and Flip-Min
Overhead: 3.125%
Overhead: 12.5%
Overhead: 6.25%Slide27
Cost Reduction vs. Cost aware FNW and Flip-Min
Overhead: 12.5%
Overhead: 6.25%
Overhead: 3.125%
Cost Model Improves FNW and Flip MinSlide28
Cost Model ImprovementSlide29
Optimization Isolation
At least 15% of cost reduction without encoding optimizationSlide30
STT-RAM Cost Reduction
Costs: a =
1
,
b = 0
,
c = 0, d = 0
Overhead: 12.5%
Overhead: 6.25%
Overhead: 3.125%Slide31
Benchmark Data
Costs: a =
1
,
b = 2
,
c = 0, d =
0
Block Size: 128B (6.25% overhead)Slide32
Conclusion
Bit flip Minimization techniques are oblivious to write asymmetriesMove from the concept of bit flip minimization to cost ReductionCAFOCost model that captures the asymmetry in the write cost2D Encoder that minimizes the overall cost of write operationsSlide33
Questions?