Daniel Ford Francois Labelle Florentina I Popovici Murray Stokely Van Anh Truong Luiz Barroso Carrie Grimes and Sean Quinlan Nabeel 1 Distributed Storage System ID: 723938
Download Presentation The PPT/PDF document "Availability in Globally Distributed Sto..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Availability in Globally Distributed Storage Systems
Daniel Ford, Franc¸ois Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong,Luiz Barroso, Carrie Grimes, and Sean Quinlan - Nabeel
1Slide2
Distributed Storage System
Exponential increase in storage needsUses shared nothing architectureUses low commodity hardwareSoftware layer provides fault toleranceSuitable for data parallel, I/O bound computationsHighly scalable and cost effective2Slide3
Data Center
3Slide4
Data Center Components
4
Server Components
Racks
Interconnects
Cluster of RacksSlide5
Data Center Components
5
Server Components
Racks
Interconnects
Cluster of Racks
ALL THESE COMPONENTS CAN FAILSlide6
Google File System
6Slide7
Cell, Stripe and Chunk
7
Stripe 1 Stripe 2
Stripe 1 Stripe 2
CELL 1
CELL 2
Chunks
Chunks
Chunks
Chunks
GFS Instance 1
GFS Instance 2Slide8
Failure Sources and Events
Failure SourcesHardware – Disks, Memory etc.Software – chunk server processNetwork InterconnectPower Distribution UnitFailure EventsNode restartPlanned rebootUnplanned reboot 8Slide9
Fault Tolerance Mechanisms
Replication (R = n)‘n’ identical chunks (replication factor) are placed across storage nodes in different rack/cell/DCErasure Coding ( RS (n, m))‘n’ distinct data blocks and ‘m’ code blocksCan recover utmost ‘m’ blocks from the remaining ‘n-m’ blocks9Slide10
Replication
10
1 Chunk
5 replicas
Fast Encoding / Decoding
Very Space InefficientSlide11
Erasure Coding
11
‘n’ data blocks
Encode
‘n + m’ blocks
‘m’ code blocksSlide12
Erasure Coding
12
‘n’ data blocks
Encode
‘n + m’ blocks
‘m’ code blocksSlide13
Erasure Coding
13Highly Space EfficientSlow Encoding / Decoding
‘n’ data blocks
Decode
Encode
‘n + m’ blocks
‘m’ code blocks
‘n’ data blocksSlide14
Goal of the Paper
Characterizes the availability properties of cloud storage systemsSuggests a good availability model that helps in data placement and replication strategies14Slide15
Agenda
IntroductionFindings from the fleetCorrelated failuresModeling availability dataConclusion15Slide16
CDF of Node Unavailability
16Slide17
CDF of Node Unavailability
17Slide18
Average Availability and MTTF
Average availability of ‘N’ nodes AN = ----------------------------- Mean Time To Failure (MTTF) uptime downtime MTTF = --------------------
No. of failures
Failure 1 Failure 2
Uptime
18Slide19
CDF of Node Unavailability by Cause
19Slide20
Node Unavailability
20Slide21
Correlated Failures
Failure DomainSet of machines that simultaneously fails from a common source of failureFailure BurstSequence of node failures each occurring within a time window ‘w’ of the next37% of all failures are part of a burst of at-least 2 nodes21Slide22
Failure Burst
22Slide23
Failure Burst
23Slide24
Failure Burst
24Slide25
Failure Burst (cont..)
25Slide26
Failure Burst (cont..)
26Slide27
Failure Burst (cont..)
27Slide28
Failure Burst (cont..)
28Slide29
Failure Burst (cont..)
29Slide30
Domain Related Failures
Domain related issues – causes of correlated failuresA metric is devisedTo determine if a failure burst is domain related or randomIn evaluating the importance of domain diversity in cell design and data placement30Slide31
Rack Affinity
Probability that a burst of the same size affecting randomly chosen nodes have smaller affinity score Rack Affinity ScoreDetermines the rack concentration of the burstNo. of ways of choosing 2 nodes from the burst within the same rack where ki is the no. of nodes affected in the
i
th
rack affected.
Eg
. (1,4) and (1,1,1,2)
31Slide32
Stripe MTTF Vs Burst Size
32Slide33
Stripe MTTF Vs Burst Size
33Slide34
Stripe MTTF Vs Burst Size
34Slide35
Trace Simulation
35Slide36
Markov Model
To model & understand the impact of hardware and software changes in availabilityFocused on the availability of a stripeState : No. of available chunks (in the stripe)Transition : Rates by which a stripe moves to the next state due to:Chunk Failure ( reduces available chunks)Chunk Recoveries ( increases available chunks)36Slide37
Markov Chain
37Slide38
Markov Model Findings
RS (6,3) No correlated failures10% reduction in recovery time results in 19% reduction in unavailabilityCorrelated failuresReduced MTTF when correlated failures are modeled90% reduction in recovery time results in 6% reduction in unavailabilityReduces the benefit of increased data redundancy
38Slide39
Markov Model Findings (cont..)
39Slide40
Markov Model Findings (cont..)
R = 3 and increase in availability10% reduction in disk latency error has negative effect ???10% reduction in disk failure rate has 1.5% improvement10% reduction in node failure rate has 18% improvementImprovements below the node layer of the storage stack do not significantly improve data availability40Slide41
Single Cell Vs Multi-Cell
Trade off between availability & inter cell recovery bandwidth. Higher MTTF with Multicell replication.41Slide42
Single Cell Vs Multi Cell (cont..)
42CELL A
R3
R4
R1
R2
CELL B
CELL A
R3
R4
R1
R2
Single Cell
Multi CellSlide43
Conclusion
Correlation among node failures is importantCorrelated failures share common failure domainsMost unavailability periods are transient and differs significantly by causeReduce reboot times for kernel upgradesThe findings provides a feedback for improvingReplication and encoding schemesData placement strategiesPrimary causes of data unavailability43Slide44
44