Zettabyte Reliability with Flexible Endtoend Data Integrity Yupu Zhang Daniel Myers Andrea ArpaciDusseau Remzi ArpaciDusseau University of Wisconsin Madison 592013 1 Data Corruption ID: 771020
Download Presentation The PPT/PDF document "Zettabyte Reliability with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Zettabyte Reliability with Flexible End-to-end Data Integrity Yupu Zhang, Daniel Myers, Andrea Arpaci-Dusseau, Remzi Arpaci-DusseauUniversity of Wisconsin - Madison 5/9/2013 1
Data CorruptionImperfect hardwareDisk, memory, controllers [Bairavasundaram07, Schroeder09, Anderson03]Buggy softwareKernel, file system, firmware [Engler01, Yang04, Weinberg04]Techniques to maintain data integrityDetection: Checksums [Stein01, Bartlett04]Recovery: RAID [ Patterson88, Corbett04] 5/9/2013 2
In Reality Corruption still occurs and goes undetectedExisting checks are usually isolatedHigh-level checks are limited (e.g, ZFS)Comprehensive protection is needed 5/9/2013 3 Disk ECC Memory ECC I solated Protection L imited P rotection
Previous State of the Art End-to-end Data IntegrityChecksum for each data block is generated and verified by applicationSame checksum protects data throughout entire stackA strong checksum is usually preferred 5/9/2013 4 W rite Path Read Path
Two DrawbacksPerformance Repeatedly accessing data from in-memory cacheStrong checksum means high overheadTimelinessIt is too late to recover from the corruption that occurs before a block is written to disk5/9/20135 W rite Path Read Path unbounded time Generate Checksum Verify Checksum FAIL
Flexible End-to-end Data Integrity Goal: balance performance and reliabilityChange checksum across components or over timePerformanceFast but weaker checksum for in-memory dataSlow but stronger checksum for on-disk dataTimelinessEach component is aware of the checksumVerification can catch corruption in time5/9/20136
Our contributionModelingFramework to reason about reliability of storage systemsReliability goal: Zettabyte Reliabilityat most one undetected corruption per Zettabyte readDesign and implementationZettabyte-Reliable ZFS (Z2FS)ZFS with flexible end-to-end data integrity 5/9/2013 7
Results ReliabilityZ2FS is able to provide Zettabyte reliabilityZFS: Pettabyte at bestZ2FS detects and recovers from corruption in timePerformanceComparable to ZFS (less than 10% overhead)Overall faster than the straightforward end-to-end approach (up to 17% in some cases) 5/9/2013 8
OutlineIntroduction Analytical FrameworkOverviewExampleFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/20139
Overview of the Framework GoalAnalytically evaluate and compare reliability of storage systemsSilent Data CorruptionCorruption that is undetected by existing checksMetric: Probability of undetected data corruption when reading a data block from system (per I/O)Reliability Score = 5/9/2013 10
Models for the Framework Hard diskUndetected Bit Error Rate ()Stable, not related to timeDisk Reliability Index = MemoryFailure in Time (FIT) / Mbit ( )Longer residency time, more likely corrupted Memory Reliability Index = Checksum Probability of undetected corruption on a device with a checksum 5/9/2013 11
Calculating Focus on lifetime of blockFrom it being generated to it being readAcross multiple componentsFind all silent corruption scenarios is sum of probabilities of each silent corruption scenario during lifetime of block in storage system 5/9/2013 12
Reliability Goal Ideally, should be 0It’s impossibleGoal: Zettabyte ReliabilityAt most one SDC when reading one Zettabyte data from a storage system Assuming a dat a block is 4KB Reliability Score is 17.5 100MB/s => 2.8 x 10 -6 SDC/year 17 nines 5/9/2013 13
OutlineIntroduction Analytical FrameworkOverviewExampleFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/201314
Sample Systems NameReliability IndexDescriptionMemory Disk Worst 13.4 10 Worst memory & worst disk Consumer 14.2 12 Non-ECC memory & regular disk Server 18.8 12 ECC memory & regular disk Best 18.8 20 ECC memory & best disk 5/9/2013 15 Disk Reliability Index = Regular disk: 12 Memory Reliability Index = non-ECC memory: 14.2 ECC memory: 18.8
Example 5/9/2013 16 DISK MEM t 0 t 1 t 2 t 3 write() read() Assuming there is only one corruption in each scenario Each time period is a scenario = sum of probabilities of each time period Assuming seconds (flushing interval) Residency Time :
Example (cont.)WorstConsumerServer Best Reliability Score ( ) 5/9/2013 17 Goal: Zettabyte Reliability score: 17.5 none achieves the goal Server & Consumer disk corruption dominates need to protect disk data
OutlineIntroduction Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion5/9/2013 18
ZFS 5/9/2013 19 DISK MEM t 0 t 1 t 2 t 3 Fletcher write() read() Only on-disk blocks are protected Generate Verify
ZFS (cont.) WorstConsumerBest Reliability Score ( ) 5/9/2013 20 Goal: Zettabyte Reliability score: 17.5 Best: only Petabyte Now memory corruption dominates need end-to-end protection Server
OutlineIntroduction Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion5/9/2013 21
End-to-end ZFS 5/9/2013 22 DISK MEM t 0 t 1 t 2 t 3 write() read() Fletcher / xor Checksum is generated and verified only by application Only one type of checksum is used (Fletcher or xor ) Generate Verify
Reliability Score ( ) End-to-end ZFS (cont.) Worst Consumer Server Best Worst Consumer Server Best 5/9/2013 23 Fletcher xor provide best reliability just fall short of the goal
Performance Issue End-to-end ZFS (Fletcher) is 15% slower than ZFSEnd-to-end ZFS (xor) has only 3% overheadxor is optimized by the checksum-on-copy technique [Chu96] SystemThroughput (MB/s)Normalized Original ZFS 656.67 100% End-to-end ZFS (Fletcher) 558.22 85% End-to-end ZFS ( xor ) 639.89 97% 5/9/2013 24 Read 1GB Data from Page Cache
OutlineIntroduction Analytical FrameworkFrom ZFS to Z2FSOriginal ZFSEnd-to-end ZFSZ2FS : ZFS with flexible end-to-end data integrityImplementationEvaluationConclusion 5/9/2013 25
Z2FS Overview Goal Reduce performance overheadStill achieve Zettabyte reliabilityImplementation of flexible end-to-endStatic mode: change checksum across componentsxor as memory checksum and Fletcher as disk checksumDynamic mode: change checksum overtimeFor memory checksum, switch from xor to Fletcher after a certain period of timeLonger residency time => data more likely being corrupt
Verify Generate Static Mode 5/9/2013 27 DISK MEM t 0 t 1 t 2 t 3 write() read() Checksum Chaining Fletcher xor Generate Verify Verify
Static Mode (cont.) WorstConsumerServer Best Reliability Score ( ) 5/9/2013 28 Worst use Fletcher all the way Server & Best xor is good enough as memory checksum Consumer may drop below the goal as increases
Evolving to Dynamic ModeReliability Score vs for consumer 92 sec 5/9/2013 29 92 sec Static Dynamic s witching the memory checksum from xor to Fletcher after 92 sec
Verify GenerateGenerate Dynamic Mode 5/9/2013 30 DISK MEM t 0 t 1 t 2 t 3 write() read() Fletcher xor t 4 xor Fletcher t switch Verify Verify Verify
OutlineIntroduction Analytical FrameworkFrom ZFS to Z2FSImplementationEvaluationConclusion5/9/201331
Implementation Attach checksum to all buffersUser buffer, data page and disk blockChecksum handlingChecksum chaining & checksum switchingInterfacesChecksum-aware system calls (for better protection)Checksum-oblivious APIs (for compatibility)LOC : 6500 5/9/2013 32
OutlineIntroduction Analytical FrameworkFrom ZFS to Z2FSEvaluationConclusion5/9/201333
EvaluationQ1: How does Z2 FS handle data corruption?Fault injection experimentQ2: What’s the overall performance of Z2FS?Micro and macro benchmarks5/9/201334
Verify GenerateGenerate Fault Injection: Z 2 FS 5/9/2013 35 DISK MEM t 0 t 1 write() Fletcher xor FAIL Ask the application to rewrite
Overall Performance read a 1 GB file Warm Read-intensive 5/9/2013 36 Better protection usually means higher overhead Z 2 FS helps to reduce the overhead, especially for warm reads Dominately by Random I/ Os
OutlineIntroduction Analytical FrameworkFrom ZFS to Z2FSEvaluationConclusion5/9/201337
Summary Problem of straightforward end-to-end data integritySlow performanceUntimely detection and recoverySolution: Flexible end-to-end data integrityChange checksums across component or overtimeAnalytical FrameworkProvide insight about reliability of storage systemsImplementation of Z2FSReduce overhead while still achieve Zettabyte reliability Offer early detection and recovery 5/9/2013 38
ConclusionEnd-to-end data integrity provides comprehensive data protection One “checksum” may not always fit alle.g. strong checksum => high overheadFlexibility balances reliability and performanceEvery device is differentChoose the best checksum based on device reliability5/9/201339
Thank you! Questions? Advanced Systems Lab (ADSL) University of Wisconsin-Madison http://www.cs.wisc.edu/adsl Wisconsin Institute on Software-defined Datacenters in Madison http ://wisdom.cs.wisc.edu/ 5/9/2013 40