Space Bounds for Reliable Storage Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1 2 Replication Storage BlowUp Demands are growing exponentially ID: 770693
Download Presentation The PPT/PDF document "Space Bounds for Reliable Storage" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Space Bounds for Reliable Storage: Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1
2 Replication
Storage Blow-UpDemands are growing exponentially Replication is costlyErasure coding can help[Goodson et al. DSN 2004] [Aguilera et al. DSN 2005][Cachin and Tessaro DSN 2006][Hendricks et al. SOSP 2007][Dutta et al. DISC 2008][Cadambe et al. NCA 2014]But … within limits3
k-of-n Erasure Codes 4 encode
k-of-n Erasure Codes decode 5 encode
Motivation for Using Codes 6 Suppose we want to tolerate one failureWith replication With erasure codes
Fault Tolerant Distributed Storage Model7 n servers f can fail (crash) clients (all can fail) Asynchronous
Distributed Storage: Space Bounds8 Replication Coding O(Dc) withc concurrent writesO(Df) bits Lower bound(Dmin(f,c)) Best-of-both algorithm O( D min ( f,c ))
Register Emulation Example: 9 decode
10 Write S1 S2S3S4
Write 11 Generate timestampS1S2S3S4
Write 12 Generate timestampencode S1 S2 S3 S4
Write 13 Generate timestampencode S1 S2 S3 S4
Write 14 Generate timestampencode S1 S2 S3 S4
Write 15 S1 S2 S3S4
Write 16 Wait for n-f replies S1 S2 S3 S4
Read 17 S1 S2S3S4
18 Read S1 S2 S3S4
19 Read Wait for n-f replies S1 S2 S3 S4
Read decode 20 S1 S2 S3 S4
What about concurrent read and write? 21
22 Write S1 S2 S3 S4
23 Write S1 S2 S3 S4
24 Write Overwrite? S1 S2 S3 S4
25 Write Overwrite? Suppose yes, if TS is bigger S1 S2 S3 S4
26 Write S1 S2 S3S4
27 Write Read S1 S2 S3 S4
28 Write Read S1 S2 S3 S4
29 Write Read No written value can be restored! S1 S2 S3 S4
What About Replication? 30
31 Write Read No problem!
32 S1 S2 S3 S4
33 S1 S2 S3 S4
34 S1 S2 S3 S4
35 S1 S2 S3 S4
36 What can be overwritten? S1 S2 S3 S4
37 Can Yellow be overwritten? S1 S2 S3 S4
38 S1 S2 S3 S4
39 S1 S2 S3 S4
40 S1 S2 S3 S4
41 S1 S2 S3 S4
42 S1 S2 S3 S4
43 S1 S2 S3 S4
44 Read cannot be restored! S1 S2 S3 S4
45 Read cannot be restored! Consistency violation S1 S2 S3 S4
46 What can be overwritten? Nothing! S1 S2 S3 S4
47 S1 S2 S3 S4
48 S1 S2 S3 S4
49 S1 S2 S3 S4
Ok, so the standard algorithm can use loads of storageBut… 50
Inherent 51
Inherent 52 For asynchronous algorithms that store coded data
Distributed Storage: Space Bounds53 Replication Coding O(Dc) withc concurrent writesO(Df) bits Lower bound(Dmin(f,c)) Best-of-both algorithm O( D min ( f,c ))
But First: More About The ModelBlack-box encoding Arbitrary encoding schemeStorage holds:Coded blocks Unbounded data-independent meta-data 54 encodeindependent of other values
ObservationEvery data bit in the storage can be associated with a unique write operation Given our storage model (black-box encoding)Formal definition in the paper 55
Storage Complexity 56 meta-data do not countStorage is measured in bitscount
TheoremEvery regular lock-free MWMR register emulationTolerating f failuresAllowing c concurrent writes Storing values from domain of size 2DNeeds to store (Dmin(f,c)) bits At some point 57
Proof Steps 58
Proof Steps Pigeonhole: need D=log2|V| Bits associated with some write operation to read a value Dynamically track sets of clients/servers contributing many bits to the storageAdversary AdBlows up the storage, does not allow any write to completeProve the bound for fair adversary 59
60 D D- l +1l/2 l/ 2 Tracking Sets
61 l/ 2 l/ 2 C + (t) Tracking Sets writes that store more than D- l bits at time t D- l +1 D
62 l/ 2 l/ 2 C + (t) F(t) Tracking Sets Servers that store at least l bits at time t writes that store more than D- l bits at time t D- l +1 D
63 l/ 2 l/ 2 C + (t) F(t) Storage Size D- l +1 D
64 l/ 2 l/ 2 C + (t) F(t) Storage Size At least ( D- l +1 ) | C + (t )| bits D- l +1 D
65 l/ 2 l/ 2 C + (t) F(t) Storage Size At least ( D- l +1 ) | C + (t )| bits At least l bits D- l +1 D
Adversary AdWe’ll define a particular adversary structureControls scheduling Prevents progressBlows up the storage 66
67 l/ 2 l/ 2 C + (t) Defining Adversary Ad freeze F(t) delay C+(t) F(t) D- l +1 D
Implications of AdF only growsS ervers in F are “frozen”Writes can move out of C+If their blocks are overwritten We will next show that no client can complete a write operation68
Every set of n-f servers must store D bits of some pending write for a write to return Observation 69
Observation 70 Otherwise
Observation 71 Read No value can be restored!
Observation 72 Read No value can be restored! Consistency violation
Lemma With , servers store less than D bits for each writeProof in the paper |F(t)|73 l
Lemma Proof74 assume v can be readwrite(v) RMW to server S respondstimeif RMW writes l bits to S , S is added to Fand v cannot be restored without servers in Fwrite(v) is in C- l bits are missingotherwise, at least one bit is still missingWith , if , then there are servers from which no value of a pending write can be read Contradiction!
Corollary : Lemma + Observation With Ad, for any time t, if , then no write completes 75 |F(t)| l
76 l/ 2 l/ 2 C + (t) F(t) Storage Size Recalled At least ( D- l +1 ) | C + (t )| bits At least l bits D- l +1 D
Theorem Proof Build run r using AdHave c clients invoke writes Three possible cases: at some point in r storage cost l(f+1) bits|C+(t)| = c at some point in r storage cost (D-l+1)c bitsNone of the above by corollary, no write returns we will show this is impossibleBy setting l = D/2, we get (Dmin(f,c)) 77
Ad is Not Fair!Operations on servers in F(t) never take effect from time t onward Operations by clients that remain in C+ from some point onward never take effect 78
Constructing a Fair Run (Sketch) Run r with Ad: assume and |C+(t)| < c No write completes in rBuild r’: kill all the servers in F and clients permanently in C+r’ is fair with at least one correct processBy lock-freedom, some write completes in r’r and r’ are indistinguishable to all correct clients Therefore, some write completes in r 79
Constructing a Fair Run (Sketch) Run r with Ad: assume and |C+(t)| < c No write completes in rTherefore, some write completes in r 80 Contradiction
Constructing a Fair Run (Sketch) Run r with Ad: assume and |C+(t)| < c No write completes in rTherefore, some write completes in r 81 Details in the paper
Theorem Proof Build run r using AdHave c clients invoke writes Three Two possible cases: at some point in r storage cost l(f+1) bits|C+(t)| = c at some point in r storage cost (D-l+1)c bitsNone of the aboveBy setting l = D/2, we get (Dmin(f,c )) 82
Wrap UpWe proved a fundamental limit of coding (D min(f,c)) bits for regular lock-free register Replication is the best solution under high concurrencyWhy not enjoy both worlds?83
Adaptive AlgorithmReplication storage cost: nD Coding with k,n = O(f)storage cost: (c+1)(D/k)nWe combine both approachesStorage cost: min(2nD, (c+1)(D/k)n) = O(Dmin(f,c)) 84
Adaptive AlgorithmReplication storage cost: nD Coding with k,n = O(f)storage cost: (c+1)(D/k)nWe combine both approachesStorage cost: min(2nD, (c+1)(D/k)n) = O(Dmin(f,c)) 85 Details in the paper
Related Work Tomorrow morning [Cadambe , Wang, Lynch]Similar bound Different (incomparable) set of assumptionsDifferent proof technique Future: find unique minimal set of assumptions86