/
Vulnerabilities in MLC NAND Vulnerabilities in MLC NAND

Vulnerabilities in MLC NAND - PowerPoint Presentation

test
test . @test
Follow
376 views
Uploaded On 2018-09-20

Vulnerabilities in MLC NAND - PPT Presentation

Flash Memory Programming Experimental Analysis Exploits and Mitigation Techniques Yu Cai Saugata Ghose Yixin Luo Ken Mai Onur Mutlu Erich F Haratsch February 6 2017 Executive Summary ID: 672199

data lsb read flash lsb data flash read msb programmed program cell memory errors voltage vulnerabilities programming step interference

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Vulnerabilities in MLC NAND" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Vulnerabilities in MLC NANDFlash Memory Programming:Experimental Analysis, Exploits,and Mitigation Techniques

Yu Cai,

Saugata Ghose

, Yixin Luo,

Ken Mai, Onur Mutlu, Erich F.

Haratsch

February 6, 2017Slide2

Executive SummaryMLC (multi-level cell) NAND flash uses two-step programmingWe find new reliability and security vulnerabilities

In between two steps, cells are in a

partially-programmed

stateProgram interference, read disturb much worse for partially-programmed cells than for fully-programmed cellsWe experimentally characterize vulnerabilities using realstate-of-the-art MLC NAND flash memory chipsWe show that malicious programs can exploit vulnerabilities to corrupt data of other programs and reduce flash memory lifetimeWe propose three solutions that target vulnerabilitiesOne solution completely eliminates vulnerabilities, at the expense of4.9% program latency increaseTwo solutions mitigate vulnerabilities, increasing flash lifetime by 16%

Page

2

of 24Slide3

Presentation OutlineExecutive SummaryNAND Flash BackgroundCharacterizing New Vulnerabilities in Two-Step ProgrammingExample Sketches of Security ExploitsProtection and Mitigation Mechanisms

Conclusion

Page

3 of 24Slide4

Storing Data in NAND Flash MemoryFlash cell uses the threshold voltage of a floating-gate transistor to represent the data stored in the cell

Per-bit

cost of NAND flash memory has greatly

decreasedAggressive process technology scalingMulti-level cell (MLC) technologyPage 4 of 24NAND FlashChip

MSB: Most

Significant Bit

LSB: Least

Significant Bit

1

1

Flash CellSlide5

Programming Data to a Multi-Level CellCell programmed by pulsing a large voltage on the transistor gate

Cell-to-cell program interference

Threshold voltage of a neighboring cell

inadvertently increasesWorsens as flash memory scalesMitigation: two-step programmingPage 5 of 24

0

0

?

?

0

1

0

0

Program

0

0

10

?

?

?

0

0

0

Step 1

Step 2

0

1

10Slide6

Reading Data from a Multi-Level CellThreshold voltages represented as a probability distributionDue to process variationEach two-bit value corresponds to a state (a range of threshold voltages)

Read reference voltages

(

Va, Vb, Vc) Identify the state a cell belongs toApplied to the transistor gate to see if a cell turns onPage 6 of 24ProbabilityDensity

ER

1

1

P1

0

1

P2

0

0

P3

1

0

V

a

V

b

V

c

MSB

L

SB

Threshold Voltage

(V

th

)Slide7

NAND Flash Memory Errors and LifetimeDuring a read, raw bit errors occur when the cell threshold voltage incorrectly shifts to a different state

Controller employs

sophisticated ECC

to correct errorsIf errors exceed ECC limit, flash memory has exhausted its lifetimePage 7 of 24Raw Bit Error Rate

(RBER)

Program/Erase (P/E

)

Cycles

ECC

Correction Capability

Lifetime

OUR GOAL

Understand how two-step

programming

affects flash memory errors and

lifetime

(and what potential vulnerabilities it causes)Slide8

Presentation OutlineExecutive SummaryNAND Flash BackgroundCharacterizing New Vulnerabilities in Two-Step ProgrammingHow Can Two-Step Programming Introduce Errors?Program Interference

Read Disturb

Example Sketches of Security Exploits

Protection and Mitigation MechanismsConclusionPage 8 of 24Slide9

How Can Two-Step Programming Introduce Errors?Cell starts in the erased stateStep 1 – LSB: Partially program the cell to a temporary stateErrors are introduced into the partially-programmed

LSB data

Step

2 – MSB: Program the cell to its final stateLSB data is read with errors into internal LSB buffer, not corrected by ECCMSB data comes from controller to internal MSB bufferPage 9 of 24Flash Memory

LSB

MSB

MSB 0

LSB 0

MSB 1

LSB 1

MSB n

LSB n

. . .

. . .

Read

Without

Errors

Controller

MSB data

Read

With

Errors

. . .

ECC

Engine

Internal

Buffers

MSB

L

SB

L

SB

data

Errors in

internal LSB buffer data cause the cell

to be

programmed to an incorrect stateSlide10

Cell-to-Cell Program InterferenceFlash cells are grouped intomultiple wordlines (rows)Two-step programming

interleaves LSB, MSB steps

of

neighboring wordlinesSteps interleaved using shadow program sequencingPage 10 of 24

. . .

. . .

. . .

Wordline

Wordline

1

Wordline

0

Wordline

2

V

th

A: LSB of

Wordline

1 programmed: no interference

C: After programming LSB of

Wordline

2

V

ref

B: After programming MSB of

Wordline

0

ER

TP

Probability

Density

D

: Error when programming MSB of

Wordline

1

Wordline

1

LSB

M

SB

LSB

M

SB

Steps for neighboring

wordlines

cause interference

on

partially-programmed

cells

How bad is this interference?Slide11

Characterizing Errors in Real NAND Flash ChipsWe perform experiments on real state-of-the-art 1x-nm(i.e., 15-19nm) MLC NAND flash memory chips

More info: Cai et al

.,

FPGA-Based Solid-State Drive Prototyping Platform, FCCM 2011Page 11 of 24

FPGA

Flash

Controller

NAND Flash

DaughterboardSlide12

Measuring Errors Induced by Program InterferenceError rate increases with each programming stepA: Before interference

(LSBs in

Wordline

n just programmed)B: After programming pseudo-random data to MSBs in Wordline n-1C: After programming pseudo-random data to MSBs in Wordline n-1 and LSBs in Wordline n+1Interference depends on the data value

being programmed

H

igher voltage

 more programming pulses  more interference

W: After

programming

worst-case data pattern

to

Wordlines

n-1 and n+1Page 12 of 24

Program interference with

worst-case

data patternincreases the error rate ofpartially-programmed

cells by 4.9x

Raw Bit Error Rate

(Normalized to A)

4.9xSlide13

Read DisturbFlash block: cells from multiplewordlines connected together on bitlines

(columns)

Reading a cell from a bitlineApply read reference voltage (Vref) to cellApply a pass-through voltage (Vpass) to turn on all unread cellsPass-through voltage has a weak programming effectPage 13 of 24

. . .

. . .

. . .

Bitline

Wordline

0

Wordline

2

ER

TP

ER

P1

P2

P3

V

th

ER

LARGER GAP

GREATER EFFECT

Unprogrammed

Partially

Programmed

Fully

Programmed

V

pass

Wordline

1

V

ref

V

pass

V

pass

Partially-programmed and

unprogrammed

cells

more susceptible to read disturb errorsSlide14

Measuring Errors Induced by Read Disturb

Induce read disturbs on:

A: Fully-programmed cells

B: Partially-programmed cellsC: Unprogrammed cellsAfter read disturb, program remaining data and check error rate

Page

14

of 24

10

-1

10

-2

10

-3

10

-4

Read Disturb Count

A

B

C

LSB Data

Raw Bit Error Rate

Order of Magnitude Increase

Errors in Data Not Programmed When Read Disturb Occurs

LSB data in

partially-programmed

and

unprogrammed

cells

most susceptible

to read

disturbSlide15

Presentation OutlineExecutive SummaryNAND Flash BackgroundCharacterizing New Vulnerabilities in Two-Step ProgrammingExample Sketches of Security ExploitsProgram Interference Based Exploit

Read Disturb Based Exploit

Protection and Mitigation Mechanisms

ConclusionPage 15 of 24Slide16

Sketch of Program Interference Based ExploitMalicious program targets a piece of data that belongs to a victim programGoal: Maximize program interference induced on victim program’s dataWrite

worst-case data pattern

to neighboring wordlines (WL)Wordlines 0/1: all 1s to keep atlowest possible threshold voltageWordline 2: victim program writes dataWordlines 1 and 3: all 0sto program to highest possiblethreshold voltageIn the paperMore details on why this worksProcedure to work around data scramblingPage 16

of 24

WL 0

Malicious File A

(all 1s)

WL 1

Malicious File B

(all 0s)

Malicious File A

(all 1s)

WL

2

Data Under Attack

WL 3

Malicious File B

(all 0s)

MSB

2

1

3a

3b

LSB

MSB

LSB

MSB

LSB

MSB

LSBSlide17

Presentation OutlineExecutive SummaryNAND Flash BackgroundCharacterizing New Vulnerabilities in Two-Step ProgrammingExample Sketches of Security ExploitsProgram Interference Based Exploit

Read Disturb Based Exploit: in the paper

Protection and Mitigation Mechanisms

ConclusionPage 17 of 24Slide18

Presentation OutlineExecutive SummaryNAND Flash BackgroundCharacterizing New Vulnerabilities in Two-Step ProgrammingExample Sketches of Security ExploitsProtection and Mitigation Mechanisms

Buffering LSB Data in the Controller

Multiple Pass-Through Voltages

Adaptive LSB Read Reference VoltageConclusionPage 18 of 24Slide19

1. Buffering LSB Data in the ControllerKey Observation: During MSB programming, LSB data is read from flash cells with uncorrected interference and read disturb errorsKey Idea:

Keep

a copy

of the LSB data in the controllerPage 19 of 24Flash Memory

LSB

MSB

MSB 0

LSB 0

MSB 1

LSB 1

MSB n

LSB n

. . .

. . .

Controller

MSB data

. . .

ECC

Engine

Read

Without

Errors

Read

With

Errors

L

SB data

Completely eliminates

vulnerabilities

to

program interference and

read disturb

Typical case: 4.9%

increase in

programming

latency

Internal

BuffersSlide20

2. Multiple Pass-Through VoltagesKey Observation: Large gap between threshold voltage and pass-through voltage (Vpass) increases errors due to read disturb

Key Idea:

Minimize gap

by using three pass-through voltagesReduces raw bit error rate by 72%Increases flash lifetime by 16%Page 20 of 24

ER

TP

ER

P1

P2

P3

V

th

ER

V

pass

V

pass

partial

V

pass

erase

LARGE GAP

Unprogrammed

Partially

Programmed

Fully

Programmed

Mitigates vulnerabilities

to

read

disturb

No

increase in programming latencySlide21

Presentation OutlineExecutive SummaryNAND Flash BackgroundCharacterizing New Vulnerabilities in Two-Step ProgrammingExample Sketches of Security ExploitsProtection and Mitigation Mechanisms

Buffering LSB Data in the Controller

Multiple Pass-Through Voltages

Adaptive LSB Read Reference Voltage: in the paperConclusionPage 21 of 24Slide22

Presentation OutlineExecutive SummaryNAND Flash BackgroundCharacterizing New Vulnerabilities in Two-Step ProgrammingExample Sketches of Security ExploitsProtection and Mitigation Mechanisms

Conclusion

Page

22 of 24Slide23

Executive SummaryWe find new reliability and security vulnerabilities inMLC

NAND flash memory

In between two steps, cells are in a

partially-programmed stateProgram interference, read disturb much worse for partially-programmed cells than for fully-programmed cellsWe experimentally characterize vulnerabilities using realstate-of-the-art MLC NAND flash memory chipsWe show that malicious programs can exploit vulnerabilities to corrupt data of other programs and reduce flash memory lifetimeWe propose three solutions that target vulnerabilitiesOne solution completely eliminates vulnerabilities, at the expense of4.9% program latency increaseTwo solutions mitigate vulnerabilities

,

increasing flash lifetime by 16%

Page

23

of 24Slide24

Vulnerabilities in MLC NANDFlash Memory Programming:Experimental Analysis, Exploits,and Mitigation Techniques

Yu Cai,

Saugata Ghose

, Yixin Luo,Ken Mai, Onur Mutlu, Erich F. HaratschFebruary 6, 2017Slide25

Backup SlidesPage 25 of 24Slide26

NAND Flash Memory ScalingSSDs use NAND flash memory chips, which contain billions of flash cellsPer-bit cost of NAND flash memory has greatly decreased thanks to scalingAggressive process technology scalingFlash cell size decreases

Cells placed closer to each other

Multi-level cell (MLC)

technologyEach flash cell represents data using a threshold voltageMLC stores two bits of data in a single cellPage 26 of 24128GB

NAND Flash

256GB

NAND Flash

01

11

11

10

11

00

00

10

MSB: Most

Significant Bit

LSB: Least

Significant BitSlide27

Two-Step ProgrammingPer-bit cost of NAND flash memory has greatly decreasedAggressive process technology scalingMulti-level cell (MLC) technologyFlash cell programmed by pulsinga large voltage to the cell transistor

Cell-to-cell program interference

Threshold voltage of a neighboring

cell inadvertently increasesWorsens as flash memory scalesMitigation: two-step programmingPage 27 of 24NAND FlashChip

01

11

11

10

11

00

00

10

MSB: Most

Significant Bit

LSB: Least

Significant Bit

11

??

10

00

10

Program

11

10

??

?

0

0

0

Step 1

Step 2Slide28

Representing Data in MLC NAND Flash MemoryFlash cell uses floating-gate transistor threshold voltage to represent the data stored in the cellThreshold voltages represented as a probability distributionEach two-bit value corresponds to a state (a range of threshold voltages)

Read reference voltages

(

Va, Vb, Vc) identify the state a cell belongs toPage 28 of 24ProbabilityDensity

ER

1

1

P1

0

1

P2

0

0

P3

1

0

V

a

V

b

V

c

MSB

L

SB

Threshold Voltage

(V

th

)Slide29

Threshold Voltage Distributions During ProgrammingPage 29 of 24

Unprogrammed

1. Program LSB

2. Program MSB

Temporary

V

th

Starting

V

th

ER

1

1

P1

0

1

P2

0

0

P3

1

0

ER

X

X

Final

V

th

ER

X

1

TP

X

0

Probability

Density

Probability

Density

MSB

L

SB

Probability

DensitySlide30

Characterizing NAND Flash Memory ReliabilityRaw bit errors occur when the cell threshold voltage incorrectly shifts to a different state

Page

30

of 24Raw BitError RateProgram/Erase (P/E) Cycles

ECC

Correction Capability

Lifetime

We experimentally characterize RBER, lifetime of

state-of-the-art 1x-nm

(i.e.,

15-19nm)

MLC

NAND flash memory chipsSlide31

Malicious Program BehaviorPage 31 of 24

Raw Bit Error Rate

P/E Cycles

ECC Error Correction Capability

Normal Usage

Malicious Usage

Normal

Lifetime

Reduced

LifetimeSlide32

How Can Two-Step Programming Introduce Errors?Step 1: Program only the LSB dataErrors are introduced into thepartially-programmed LSB dataStep 2: Program the MSB dataLSB data is read with errors directly intointernal LSB buffer,

not corrected

by ECC

MSB data comes from controller to internal MSB bufferPage 32 of 24PartiallyProgrammedV

th

Erased

V

th

ER

1

1

P1

0

1

P2

0

0

P3

1

0

ER

?

?

Final

V

th

ER

?

1

TP

?

0

Probability Density

MSB

LSB

Flash Memory

LSB

MSB

MSB 0

LSB 0

MSB 1

LSB 1

MSB n

LSB n

. . .

. . .

Read

Without

Errors

Controller

MSB data

Read

With

Errors

. . .

ECC

Engine

Errors in

LSB data

cause

cell to be

programmed to an incorrect stateSlide33

Data Scrambler WorkaroundSome flash controllers employ XOR-based data scramblingWorkaround to write worst-case data patternRecreate scrambler logic in softwareScramble data in software with the same seedHardware scrambler descrambles data using the same seed

Descrambled data written to flash memory

Page

33 of 24ScramblerSEED Linear Feedback Shift Register+

Logical Block Address

Output

Input

Malicious Program

Unscrambled Worst-Case Data

Software Scrambler

SSD Controller

Hardware Scrambler

ECC Engine

SCRAMBLED DATA

DESCRAMBLED

DATA

Flash

Memory

DESCRAMBLED DATA

1

2

3

4

KEYSlide34

Sketch of Read Disturb Based ExploitMalicious program wants to induce errors into unprogrammed and partially-programmed wordlines in an open blockRapidly issues large number of reads to the open blockWrite data to the open blockIssues ~10K reads per second

directly to the SSD

using

syscallsInduces errors in partially-programmed dataInduces errors in data not yet programmedProgramming can only increase threshold voltageExploit increases threshold voltage before programming, preventing cell from storing some data valuesIn the paper: working around SSD cachesPage 34 of 24Slide35

1. Buffering LSB Data in the ControllerWhen LSB data is initially programmed, keep a copy in the controller DRAMDuring MSB programming, send both LSB and MSB data from controller to internal LSB/MSB buffers in flash memory

Procedure to retrieve, correct data from flash memory if DRAM loses data

(e.g., after power loss)

Page 35 of 24Flash Memory

LSB

MSB

MSB 0

LSB 0

MSB 1

LSB 1

MSB n

LSB n

. . .

. . .

Controller

MSB data

. . .

ECC

Engine

Read

Without

Errors

Read

With

Errors

L

SB data

Completely eliminates

vulnerabilities

to interference, read disturb

Typical case: 4.9%

increase in

programming

latencySlide36

Algorithm for Buffering LSB DataPage 36 of 24

A

:

Send LSB data to internalLSB bufferYES

Step 1

Step 2

B

:

Keep copy

of LSB in

DRAM buffer

Program

LSB page

C

:

Is LSB

in DRAM buffer?

E

:

Send LSB data to

internal

LSB buffer

D

:

Retrieve LSB data from

DRAM buffer

F

:

Send MSB data to

internal

MSB buffer

G

:

Retrieve LSB

data from

flash chip

NO

H

:

Correct LSB data using

ECC engine

Program

MSB pageSlide37

Latency Impact of BufferingVary the speed of the interface between the controller and the flash memoryAssumes 8KB page sizePage 37 of 24

Baseline Latency

LSB Page in DRAM

LSB Page Not in DRAMSlide38

Error Rate with Multiple Pass-Through VoltagesPage 38 of 24

Single Pass-Through Voltage

Multiple

Pass-Through VoltagesLimit

LSB:

unprogrammed

,

partially programmed

M

SB: fully programmed

M

SB:

unprogrammed

,

partially programmed

LSB: fully programmed

LimitSlide39

3. Adaptive LSB Read Reference VoltageAdapt the read reference voltage used to read partially-programmed LSB dataCompensates for threshold voltage shifts caused by program interference, read disturbMaintain one read reference voltage per die

Relearn voltage once a day

by checking error rate of test LSB data

Reduces error count, but does not completely eliminate errorsPage 39 of 24Baseline: Fixed Vref

Adaptive

V

ref

-30%

-21%Slide40

3. Adaptive LSB Read Reference VoltageAdapt the read reference voltage for partially-programmed LSB data to compensate for voltage shifts

Program reference data value to LSBs of test

wordlines

Relearn voltage once a day by checking error rate of test dataReduces error count by 21-30%, but does not completely eliminate errorsPage 40 of 24Vth

After interference,

read disturb

V

ref

Before interference, read disturb

ER

TP

Probability

Density

Mitigates, but doesn’t fully eliminate,

vulnerabilities

No

increase in programming latencySlide41

ConclusionTwo-step programming used in MLC NAND flash memoryIntroduces new reliability and security vulnerabilities

Partially-programmed cells

susceptible to

program interference and read disturbWe experimentally characterize vulnerabilities using real NAND flash chipsMalicious programs can exploit vulnerabilities to corrupt data belonging to other programs, and reduce flash memory lifetimePage 41 of 24SolutionProtects Against

Latency Overhead

Error Rate Reduction

1.

Buffering LSB in

the Controller

program

interference

read disturb

4.9%

100%

2.

Adaptive LSB Read

Reference Voltage

program

interference

read disturb

0.0%

21-33%

3. Multiple

Pass-Through

Voltages

read disturb

0.0%

72%

16% lifetime increase