Flexible Endtoend Data Integrity Yupu Zhang Daniel Myers Andrea ArpaciDusseau Remzi ArpaciDusseau University of Wisconsin Madison 592013 1 Data Corruption Imperfect ID: 742059
Download Presentation The PPT/PDF document "Zettabyte Reliability with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Zettabyte Reliability with Flexible End-to-end Data Integrity
Yupu Zhang, Daniel Myers, Andrea Arpaci-Dusseau, Remzi Arpaci-DusseauUniversity of Wisconsin - Madison
5/9/2013
1Slide2
Data CorruptionImperfect hardware
Disk, memory, controllers [Bairavasundaram07, Schroeder09, Anderson03]Buggy softwareKernel, file system, firmware [Engler01, Yang04, Weinberg04]Techniques to maintain data integrityDetection: Checksums [Stein01, Bartlett04]Recovery: RAID [
Patterson88, Corbett04]
5/9/2013
2Slide3
In Reality
Corruption still occurs and goes undetectedExisting checks are usually isolatedHigh-level checks are limited (e.g, ZFS)Comprehensive
protection is needed
5/9/2013
3
Disk
ECC
Memory ECC
I
solated Protection
L
imited
P
rotectionSlide4
Previous State of the Art
End-to-end Data IntegrityChecksum for each data block is generated and verified by applicationSame checksum protects data throughout entire stackA strong checksum is usually preferred
5/9/2013
4
W
rite Path
Read PathSlide5
Two DrawbacksPerformance
Repeatedly accessing data from in-memory cacheStrong checksum means high overheadTimelinessIt is too late to recover from the corruption that occurs before a block is written to disk5/9/20135
W
rite Path
Read Path
unbounded
time
Generate
Checksum
Verify
Checksum
FAILSlide6
Flexible End-to-end Data Integrity
Goal: balance performance and reliabilityChange checksum across components or over timePerformanceFast but weaker checksum for in-memory dataSlow but stronger checksum for on-disk dataTimelinessEach component is aware of the checksumVerification can catch corruption in time5/9/20136Slide7
Our contributionModelingFramework to reason about reliability
of storage systemsReliability goal: Zettabyte Reliabilityat most one undetected corruption per Zettabyte readDesign and implementationZettabyte-Reliable ZFS (Z2FS)ZFS with flexible end-to-end data integrity
5/9/2013
7Slide8
Results
ReliabilityZ2FS is able to provide Zettabyte reliabilityZFS: Pettabyte at bestZ2FS detects and recovers from corruption in timePerformanceComparable to ZFS (less than 10% overhead)Overall faster than the straightforward end-to-end approach
(up to 17% in some cases)
5/9/2013
8Slide9
OutlineIntroduction
Analytical FrameworkOverviewExampleFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/20139Slide10
Overview of the Framework
GoalAnalytically evaluate and compare reliability of storage systemsSilent Data CorruptionCorruption that is undetected by existing checksMetric: Probability of undetected data corruption when reading a data block from system (per I/O)Reliability Score =
5/9/2013
10Slide11
Models for the Framework
Hard diskUndetected Bit Error Rate ()Stable, not related to timeDisk Reliability Index = MemoryFailure in Time (FIT) / Mbit (
)Longer residency time, more likely corrupted
Memory Reliability Index
=
Checksum
Probability of undetected corruption on a device
with a checksum
5/9/2013
11Slide12
Calculating
Focus on lifetime of blockFrom it being generated to it being readAcross multiple componentsFind all silent corruption scenarios
is sum of probabilities of each silent corruption scenario during lifetime of block in storage system
5/9/2013
12Slide13
Reliability Goal
Ideally, should be 0It’s impossibleGoal: Zettabyte ReliabilityAt most one SDC when reading one Zettabyte data from a storage system
Assuming a dat
a block is 4KB
Reliability Score is
17.5
100MB/s => 2.8 x 10
-6
SDC/year
17 nines
5/9/2013
13Slide14
OutlineIntroduction
Analytical FrameworkOverviewExampleFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/201314Slide15
Sample Systems
NameReliability IndexDescriptionMemory
Disk
Worst
13.4
10
Worst memory & worst disk
Consumer
14.2
12
Non-ECC
memory & regular disk
Server
18.8
12
ECC memory & regular disk
Best
18.8
20
ECC memory & best disk
5/9/2013
15
Disk Reliability Index =
Regular disk: 12
Memory Reliability Index =
non-ECC memory: 14.2
ECC memory:
18.8
Slide16
Example
5/9/2013
16
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Assuming
there is only
one
corruption in each
scenario
Each time period is a scenario
= sum of probabilities of each time period
Assuming
seconds (flushing interval)
Residency Time
:
Slide17
Example
(cont.)WorstConsumerServer
Best
Reliability Score
(
)
5/9/2013
17
Goal:
Zettabyte
Reliability
score: 17.5
none achieves the goal
Server & Consumer
disk corruption dominates
need to protect disk dataSlide18
OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion5/9/2013
18Slide19
ZFS
5/9/2013
19
DISK
MEM
t
0
t
1
t
2
t
3
Fletcher
write()
read()
Only on-disk blocks are protected
Generate
VerifySlide20
ZFS (cont.)
WorstConsumerBest
Reliability Score
(
)
5/9/2013
20
Goal:
Zettabyte
Reliability
score: 17.5
Best: only Petabyte
Now memory corruption dominates
need
end-to-end protection
ServerSlide21
OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion5/9/2013
21Slide22
End-to-end ZFS
5/9/2013
22
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Fletcher /
xor
Checksum is generated and verified only by application
Only one type of checksum is used (Fletcher or
xor
)
Generate
VerifySlide23
Reliability Score (
)
End-to-end ZFS
(cont.)
Worst
Consumer
Server
Best
Worst
Consumer
Server
Best
5/9/2013
23
Fletcher
xor
provide best reliability
just fall short of the goalSlide24
Performance Issue
End-to-end ZFS (Fletcher) is 15% slower than ZFSEnd-to-end ZFS (xor) has only 3% overheadxor is optimized by the checksum-on-copy technique [Chu96]System
Throughput (MB/s)Normalized
Original ZFS
656.67
100%
End-to-end ZFS (Fletcher)
558.22
85%
End-to-end ZFS (
xor
)
639.89
97%
5/9/2013
24
Read 1GB Data from Page CacheSlide25
OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion
5/9/2013
25Slide26
Z2FS Overview
Goal Reduce performance overheadStill achieve Zettabyte reliabilityImplementation of flexible end-to-endStatic mode: change checksum across componentsxor as memory checksum and Fletcher as disk checksumDynamic mode: change checksum overtimeFor memory checksum, switch from xor to Fletcher after a certain period of timeLonger residency time => data more likely being corruptSlide27
Verify
Generate
Static Mode
5/9/2013
27
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Checksum
Chaining
Fletcher
xor
Generate
Verify
VerifySlide28
Static Mode (cont.)
WorstConsumerServer
Best
Reliability Score
(
)
5/9/2013
28
Worst
use
Fletcher all the
way
Server & Best
xor
is good
enough as memory checksum
Consumer
may
drop below the goal
as
increases
Slide29
Evolving to
Dynamic ModeReliability Score vs
for consumer
92 sec
5/9/2013
29
92 sec
Static
Dynamic
s
witching
the memory checksum from
xor
to Fletcher after 92 secSlide30
Verify
GenerateGenerate
Dynamic
Mode
5/9/2013
30
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Fletcher
xor
t
4
xor
Fletcher
t
switch
Verify
Verify
VerifySlide31
OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/201331Slide32
Implementation
Attach checksum to all buffersUser buffer, data page and disk blockChecksum handlingChecksum chaining & checksum switchingInterfacesChecksum-aware system calls (for better protection)Checksum-oblivious APIs (for compatibility)LOC : 6500
5/9/2013
32Slide33
OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSEvaluationConclusion5/9/201333Slide34
EvaluationQ1: How does Z2
FS handle data corruption?Fault injection experimentQ2: What’s the overall performance of Z2FS?Micro and macro benchmarks5/9/201334Slide35
Verify
GenerateGenerate
Fault Injection:
Z
2
FS
5/9/2013
35
DISK
MEM
t
0
t
1
write()
Fletcher
xor
FAIL
Ask the application to rewriteSlide36
Overall Performance
read a 1 GB file
Warm Read-intensive
5/9/2013
36
Better protection
usually means higher overhead
Z
2
FS helps to reduce the
overhead, especially for warm reads
Dominately
by
Random
I/
OsSlide37
OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSEvaluationConclusion5/9/201337Slide38
Summary
Problem of straightforward end-to-end data integritySlow performanceUntimely detection and recoverySolution: Flexible end-to-end data integrityChange checksums across component or overtimeAnalytical FrameworkProvide insight about reliability of storage systemsImplementation of Z2FSReduce overhead while still achieve Zettabyte reliability Offer early detection and recovery
5/9/2013
38Slide39
ConclusionEnd-to-end data integrity provides comprehensive data protection
One “checksum” may not always fit alle.g. strong checksum => high overheadFlexibility balances reliability and performanceEvery device is differentChoose the best checksum based on device reliability5/9/201339Slide40
Thank you!
Questions?
Advanced Systems Lab (ADSL)
University of Wisconsin-Madison
http://www.cs.wisc.edu/adsl
Wisconsin Institute on Software-defined Datacenters in Madison
http
://wisdom.cs.wisc.edu/
5/9/2013
40