/
Data Retention  in  MLC NAND Flash Data Retention  in  MLC NAND Flash

Data Retention in MLC NAND Flash - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
366 views
Uploaded On 2018-11-10

Data Retention in MLC NAND Flash - PPT Presentation

Memory Characterization Optimization and Recovery Yu Cai Yixin Luo Erich F Haratsch Ken Mai Onur Mutlu Carnegie Mellon University LSI Corporation 1 You Probably Know ID: 725750

read retention opt loss retention read loss opt flash voltage errors pdf ref reference vth normalized data goal mechanism optimal lifetime design

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Retention in MLC NAND Flash" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Yu Cai, Yixin Luo, Erich F. Haratsch*,Ken Mai, Onur MutluCarnegie Mellon University, *LSI Corporation

1Slide2

You Probably KnowMany use cases:

+ High performance, low energy consumption2Slide3

NAND Flash Memory Challenges– Requires erase before program (write)– High raw bit error rate

3CPU

Flash Controller

ECC Controller

Raw Flash Memory ChipsSlide4

Limited Flash Memory Lifetime

4Program/Erase (P/E) Cycles (or Writes Per

Cell)

Raw bit error rate (RBER)

ECC-correctable RBER

Newer generation

~3000

~2000

Goal: Extend flash memory

lifetime at low cost

P/E Cycle LifetimeSlide5

Retention Loss5

Charge leakage over timeOne dominant source of flash memory errors [DATE ‘12, ICCD ‘12]

1

0

0

Retention error

Flash cellSlide6

NAND Flash 1016

Before I show youhow we extend flash lifetime …Slide7

Threshold Voltage (Vth)7

Normalized Vth

0

1

Flash cell

Flash cellSlide8

Threshold Voltage (Vth) Distribution8

Normalized Vth

0

1

Probability Density Function (PDF)Slide9

Read Reference Voltage (Vref)

9Normalized Vth

PDF

0

1

V

refSlide10

Multi-Level Cell (MLC)10

Normalized Vth

Erased

(11)

P1

(10)

P2

(00)

P3

(01)

PDF

ER-P1

V

ref

P1-P2

V

ref

P2-P3

V

refSlide11

11

Normalized Vth

PDF

P1

(10)

P2

(00)

P3

(01)

Before retention loss:

After some retention loss:

Threshold Voltage Reduces Over TimeSlide12

Fixed Read Reference Voltage Becomes Suboptimal12

Normalized VthP1-P2 Vref

P2-P3

V

ref

Normalized V

th

PDF

P1

(10)

P2

(00)

P3

(01)

Raw bit errors

Before retention loss:

After some retention loss:Slide13

Optimal Read Reference Voltage (OPT)

13

Normalized V

th

PDF

P1

(10)

P2

(00)

P3

(01)

P1-P2

V

ref

P2-P3

V

ref

P1-P2 OPT

P2-P3 OPT

Minimal raw bit errors

After some retention

loss:Slide14

14

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltageSlide15

Correctable errors

Retention Failure

15

Normalized V

th

PDF

P1

(10)

P2

(00)

P3

(01)

P1-P2

V

ref

P2-P3

V

ref

Uncorrectable errors

After some retention loss:

After

significant

retention loss:Slide16

16

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltageGoal 2: Design an offline mechanism to recover data after detecting uncorrectable errorsSlide17

17

To understand the effects of retention loss: - Characterize retention loss using real chipsSlide18

18

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errorsTo understand the effects of retention loss: - Characterize retention loss using real chips

Goal 1: Design a low-cost mechanism that dynamically

finds the optimal read reference

voltageSlide19

Characterization Methodology19

FPGA-based flash memory testing platform [Cai+,FCCM ‘11]Slide20

Characterization MethodologyFPGA-based flash memory testing platformReal 20- to 24-nm MLC NAND flash chips

0- to 40-day worth of retention lossRoom temperature (20⁰C)0 to 50k P/E Cycles20Slide21

21

Characterize the effects of retention loss1. Threshold Voltage Distribution2. Optimal Read Reference Voltage

3. RBER and P/E Cycle LifetimeSlide22

1. Threshold Voltage (Vth) Distribution22

Normalized Vth

PDF

P1

P2

P3Slide23

1. Threshold Voltage (V

th) Distribution23

Finding: Cell’s threshold voltage decreases over time

P1

P2

P3

0-day

40-day

0-day

40-daySlide24

2. Optimal Read Reference Voltage (OPT)

24P1P2P3

Finding: OPT decreases over time

0-day OPT

40-day OPT

0-day OPT

40-day OPTSlide25

3. RBER and P/E Cycle Lifetime25

P/E CyclesRBERSlide26

Actual OPT

Reading data with 7-day worth of retention loss.

3. RBER and P/E Cycle Lifetime

26

ECC-correctable RBER

Finding: Using actual OPT achieves the longest lifetime

V

ref

closer to actual OPT

Nominal Lifetime

Extended LifetimeSlide27

Characterization SummaryDue to retention lossCell’s threshold voltage

(Vth) decreases over timeOptimal read reference voltage (OPT) decreases over timeUsing the actual OPT for readingAchieves the longest lifetime27Slide28

28

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errorsTo understand the effects of retention loss: - Characterize retention loss using real chips

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read

reference

voltageSlide29

Naïve Solution: Sweeping Vref

Key idea: Read the data multiple times with different read reference voltages until the raw bit errors are correctable by ECCFinds the optimal read reference voltageRequires many read-retries  higher read latency29Slide30

Comparison of Flash Read Techniques30

Flash Read TechniquesLifetime(P/E Cycle)Performance

(Read Latency)

Fixed

V

ref

Sweeping

V

ref

Our Goal

Slide31

1. The optimal read reference voltage gradually decreases over time

Key idea: Record the old OPT as a prediction (Vpred) of the actual OPTBenefit: Close to actual OPT  Fewer read retries2. The amount of retention loss is similar across pages within a flash blockKey idea: Record only one

Vpred for each block

Benefit:

Small storage overhead (768KB out of 512GB)

Observations

31Slide32

Retention Optimized Reading (ROR)Components:1. Online

pre-optimization algorithmPeriodically records a Vpred for each block2. Improved read-retry techniqueUtilizes the recorded Vpred to minimize read-retry count32Slide33

1. Online Pre-Optimization AlgorithmTriggered periodically (e.g., per day)Find and record an OPT as per-block

VpredPerformed in backgroundSmall storage overhead33Normalized Vth

PDF

New

V

pred

Old

V

predSlide34

2. Improved Read-Retry TechniquePerformed as normal readVpred

already close to actual OPTDecrease Vref if Vpred fails, and retry34Normalized Vth

PDF

OPT

V

pred

Very closeSlide35

Retention Optimized Reading: Summary35

Flash Read TechniquesLifetime(P/E Cycle)Performance

(Read Latency)

Fixed

V

ref

Sweeping

V

ref

64% ↑

ROR

64% ↑

_____

Nom. Life: 2.4% ↓

Ext. Life: 70.4% ↓Slide36

36

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errorsTo understand the effects of retention loss: - Characterize retention loss using real chips

Goal 1:

Design a low-cost mechanism that dynamically finds the optimal read reference

voltageSlide37

Correctable errors

Retention Failure

37

Normalized V

th

PDF

P1

(10)

P2

(00)

P3

(01)

P1-P2

V

ref

P2-P3

V

ref

Uncorrectable errors

After some retention loss:

After

significant

retention loss:Slide38

Leakage Speed Variation38

Normalized VthPDF

S

F

low-leaking cell

ast

-leaking cell

S

FSlide39

Initially, Right After Programming39

Normalized VthPDF

S

F

S

F

S

F

S

F

P2

P3Slide40

P2

P3

F

F

F

F

After Some Retention Loss

40

Normalized V

th

PDF

S

F

S

F

S

F

S

F

Fast-leaking cells have lower V

th

Slow-leaking cells have higher V

thSlide41

Eventually: Retention Failure

41

Normalized V

th

PDF

S

F

S

F

S

F

S

F

OPT

P2

P3Slide42

Retention Failure Recovery (RFR)Key idea: Guess original state of the cell from its leakage speed property

Three stepsIdentify risky cellsIdentify fast-/slow-leaking cellsGuess original states42Slide43

1. Identify Risky Cells

43Normalized VthPDF

S

S

F

F

OPT+

σ

OPT

OPT–

σ

Risky cells

P2

P3

+ S =

+ F =

Key FormulaSlide44

2. Identifying Fast- vs. Slow-Leaking Cells

44Normalized VthPDF

OPT+

σ

OPT

OPT–

σ

Risky cells

P2

P3

+ S =

+ F =

Key Formula

?

?

?

?

?

?Slide45

2. Identifying Fast- vs. Slow-Leaking Cells

45Normalized VthPDF

OPT+

σ

OPT

OPT–

σ

Risky cells

P2

P3

+ S =

+ F =

Key Formula

?

?

?

?

S

F

F

S

?

?Slide46

3. Guess Original States

46

Normalized V

th

PDF

S

F

F

S

Risky cells

P2

P3

+ S =

+ F =

Key FormulaSlide47

RFR EvaluationExpect

to eliminate 50% of raw bit errorsECC can correct remaining errors47

Program with random data

Detect failure, backup data

Recover data

28 days

12

addt’l

.

daysSlide48

48

Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errorsTo understand the effects of retention loss: - Characterize retention loss using real chips

Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference

voltageSlide49

ConclusionProblem:

Retention loss reduces flash lifetimeOverall Goal: Extend flash lifetime at low costFlash Characterization: Developed an understanding of the effects of retention loss in real chipsRetention Optimized Reading: A low-cost mechanism that dynamically finds the optimal read reference voltage64% lifetime

↑, 70.4% read latency ↓

Retention Failure Recovery:

An offline mechanism that recovers

data after detecting uncorrectable errorsRaw bit error rate 50%

, reduces data loss

49Slide50

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery

Yu Cai, Yixin Luo, Erich F. Haratsch*,Ken Mai, Onur MutluCarnegie Mellon University, *LSI Corporation

50Slide51

Backup Slides

51Slide52

RFR MotivationData loss can happen in many waysHigh P/E cycle

High temperature  accelerates retention lossHigh retention age (lost power for a long time)52Slide53

What if there are other errors?Key: RFR does not have to correct all errors

Example:ECC can correct 40 errors in a pageCorrupted page has 20 retention errors, 25 other errors (45 total errors)After RFR: 10 retention errors, 30 other errors (40 total errors  ECC correctable)53