Flexible Endtoend Data Integrity Yupu Zhang Daniel Myers Andrea ArpaciDusseau Remzi ArpaciDusseau University of Wisconsin Madison 592013 1 Data Corruption Imperfect ID: 794765
Download The PPT/PDF document "Zettabyte Reliability with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Zettabyte Reliability with Flexible End-to-end Data Integrity
Yupu Zhang, Daniel Myers, Andrea Arpaci-Dusseau, Remzi Arpaci-DusseauUniversity of Wisconsin - Madison
5/9/2013
1
Slide2Data CorruptionImperfect
hardwareDisk, memory, controllers [Bairavasundaram07, Schroeder09, Anderson03]Buggy softwareKernel, file system, firmware [Engler01, Yang04, Weinberg04]Techniques to maintain data integrityDetection: Checksums [Stein01, Bartlett04]Recovery: RAID [
Patterson88, Corbett04]
5/9/2013
2
Slide3In Reality
Corruption still occurs and goes undetectedExisting checks are usually isolatedHigh-level checks are limited (e.g, ZFS)Comprehensive
protection is needed
5/9/2013
3
Disk
ECC
Memory ECC
I
solated Protection
L
imited
P
rotection
Slide4Previous State of the Art
End-to-end Data IntegrityChecksum for each data block is generated and verified by applicationSame checksum protects data throughout entire stackA strong checksum is usually preferred
5/9/2013
4
W
rite Path
Read Path
Slide5Two DrawbacksPerformance
Repeatedly accessing data from in-memory cacheStrong checksum means high overheadTimelinessIt is too late to recover from the corruption that occurs before a block is written to disk5/9/20135
W
rite Path
Read Path
unbounded
time
Generate
Checksum
Verify
Checksum
FAIL
Slide6Flexible End-to-end Data Integrity
Goal: balance performance and reliabilityChange checksum across components or over timePerformanceFast but weaker checksum for in-memory dataSlow but stronger checksum for on-disk dataTimelinessEach component is aware of the checksumVerification can catch corruption in time5/9/20136
Slide7Our contributionModelingFramework to reason about reliability
of storage systemsReliability goal: Zettabyte Reliabilityat most one undetected corruption per Zettabyte readDesign and implementationZettabyte-Reliable ZFS (Z2FS)ZFS with flexible end-to-end data integrity
5/9/2013
7
Slide8Results
ReliabilityZ2FS is able to provide Zettabyte reliabilityZFS: Pettabyte at bestZ2FS detects and recovers from corruption in timePerformanceComparable to ZFS (less than 10% overhead)Overall faster than the straightforward end-to-end approach
(up to 17% in some cases)
5/9/2013
8
Slide9OutlineIntroduction
Analytical FrameworkOverviewExampleFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/20139
Slide10Overview of the Framework
GoalAnalytically evaluate and compare reliability of storage systemsSilent Data CorruptionCorruption that is undetected by existing checksMetric: Probability of undetected data corruption when reading a data block from system (per I/O)Reliability Score =
5/9/2013
10
Slide11Models for the Framework
Hard diskUndetected Bit Error Rate ()Stable, not related to timeDisk Reliability Index = MemoryFailure in Time (FIT) / Mbit (
)Longer residency time, more likely corrupted
Memory Reliability Index
=
Checksum
Probability of undetected corruption on a device
with a checksum
5/9/2013
11
Slide12Calculating
Focus on lifetime of blockFrom it being generated to it being readAcross multiple componentsFind all silent corruption scenarios
is sum of probabilities of each silent corruption scenario during lifetime of block in storage system
5/9/2013
12
Slide13Reliability Goal
Ideally, should be 0It’s impossibleGoal: Zettabyte ReliabilityAt most one SDC when reading one Zettabyte data from a storage system
Assuming a dat
a block is 4KB
Reliability Score is
17.5
100MB/s => 2.8 x 10
-6
SDC/year
17 nines
5/9/2013
13
Slide14OutlineIntroduction
Analytical FrameworkOverviewExampleFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/201314
Slide15Sample Systems
NameReliability IndexDescriptionMemory
Disk
Worst
13.4
10
Worst memory & worst disk
Consumer
14.2
12
Non-ECC
memory & regular disk
Server
18.8
12
ECC memory & regular disk
Best
18.8
20
ECC memory & best disk
5/9/2013
15
Disk Reliability Index =
Regular disk: 12
Memory Reliability Index =
non-ECC memory: 14.2
ECC memory:
18.8
Example
5/9/2013
16
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Assuming
there is only
one
corruption in each
scenario
Each time period is a scenario
= sum of probabilities of each time period
Assuming
seconds (flushing interval)
Residency Time
:
Example
(cont.)WorstConsumerServer
Best
Reliability Score
(
)
5/9/2013
17
Goal:
Zettabyte
Reliability
score: 17.5
none achieves the goal
Server & Consumer
disk corruption dominates
need to protect disk data
Slide18OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion5/9/2013
18
Slide19ZFS
5/9/2013
19
DISK
MEM
t
0
t
1
t
2
t
3
Fletcher
write()
read()
Only on-disk blocks are protected
Generate
Verify
Slide20ZFS (cont.)
WorstConsumerBest
Reliability Score
(
)
5/9/2013
20
Goal:
Zettabyte
Reliability
score: 17.5
Best: only Petabyte
Now memory corruption dominates
need
end-to-end protection
Server
Slide21OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion5/9/2013
21
Slide22End-to-end ZFS
5/9/2013
22
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Fletcher /
xor
Checksum is generated and verified only by application
Only one type of checksum is used (Fletcher or
xor
)
Generate
Verify
Slide23Reliability Score (
)
End-to-end ZFS
(cont.)
Worst
Consumer
Server
Best
Worst
Consumer
Server
Best
5/9/2013
23
Fletcher
xor
provide best reliability
just fall short of the goal
Slide24Performance Issue
End-to-end ZFS (Fletcher) is 15% slower than ZFSEnd-to-end ZFS (xor) has only 3% overheadxor is optimized by the checksum-on-copy technique [Chu96]
SystemThroughput (MB/s)Normalized
Original ZFS
656.67
100%
End-to-end ZFS (Fletcher)
558.22
85%
End-to-end ZFS (
xor
)
639.89
97%
5/9/2013
24
Read 1GB Data from Page Cache
Slide25OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion
5/9/2013
25
Slide26Z2FS Overview
Goal Reduce performance overheadStill achieve Zettabyte reliabilityImplementation of flexible end-to-endStatic mode: change checksum across componentsxor as memory checksum and Fletcher as disk checksumDynamic mode: change checksum overtimeFor memory checksum, switch from xor to Fletcher after a certain period of timeLonger residency time => data more likely being corrupt
Slide27Verify
Generate
Static Mode
5/9/2013
27
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Checksum
Chaining
Fletcher
xor
Generate
Verify
Verify
Slide28Static Mode (cont.)
WorstConsumerServer
Best
Reliability Score
(
)
5/9/2013
28
Worst
use
Fletcher all the
way
Server & Best
xor
is good
enough as memory checksum
Consumer
may
drop below the goal
as
increases
Evolving to
Dynamic ModeReliability Score vs
for consumer
92 sec
5/9/2013
29
92 sec
Static
Dynamic
s
witching
the memory checksum from
xor
to Fletcher after 92 sec
Slide30Verify
GenerateGenerate
Dynamic
Mode
5/9/2013
30
DISK
MEM
t
0
t
1
t
2
t
3
write()
read()
Fletcher
xor
t
4
xor
Fletcher
t
switch
Verify
Verify
Verify
Slide31OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/201331
Slide32Implementation
Attach checksum to all buffersUser buffer, data page and disk blockChecksum handlingChecksum chaining & checksum switchingInterfacesChecksum-aware system calls (for better protection)Checksum-oblivious APIs (for compatibility)LOC : 6500
5/9/2013
32
Slide33OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSEvaluationConclusion5/9/201333
Slide34EvaluationQ1: How does Z2
FS handle data corruption?Fault injection experimentQ2: What’s the overall performance of Z2FS?Micro and macro benchmarks5/9/201334
Slide35Verify
GenerateGenerate
Fault Injection:
Z
2
FS
5/9/2013
35
DISK
MEM
t
0
t
1
write()
Fletcher
xor
FAIL
Ask the application to rewrite
Slide36Overall Performance
read a 1 GB file
Warm Read-intensive
5/9/2013
36
Better protection
usually means higher overhead
Z
2
FS helps to reduce the
overhead, especially for warm reads
Dominately
by
Random
I/
Os
Slide37OutlineIntroduction
Analytical FrameworkFrom ZFS to Z2FSEvaluationConclusion5/9/201337
Slide38Summary
Problem of straightforward end-to-end data integritySlow performanceUntimely detection and recoverySolution: Flexible end-to-end data integrityChange checksums across component or overtimeAnalytical FrameworkProvide insight about reliability of storage systemsImplementation of Z2FSReduce overhead while still achieve Zettabyte reliability Offer early detection and recovery
5/9/2013
38
Slide39ConclusionEnd-to-end data integrity provides comprehensive data protection
One “checksum” may not always fit alle.g. strong checksum => high overheadFlexibility balances reliability and performanceEvery device is differentChoose the best checksum based on device reliability5/9/201339
Slide40Thank you!
Questions?
Advanced Systems Lab (ADSL)
University of Wisconsin-Madison
http://www.cs.wisc.edu/adsl
Wisconsin Institute on Software-defined Datacenters in Madison
http
://wisdom.cs.wisc.edu/
5/9/2013
40