/
Space Bounds for Reliable Storage Space Bounds for Reliable Storage

Space Bounds for Reliable Storage - PowerPoint Presentation

trish-goza
trish-goza . @trish-goza
Follow
344 views
Uploaded On 2019-12-17

Space Bounds for Reliable Storage - PPT Presentation

Space Bounds for Reliable Storage Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1 2 Replication Storage BlowUp Demands are growing exponentially ID: 770693

storage write read bits write storage bits read min cost servers run replication store coding clients completes writes point

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Space Bounds for Reliable Storage" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Space Bounds for Reliable Storage: Fundamental Limits of Coding Alexander Spiegelman Yuval Cassuto Gregory Chockler Idit Keidar 1

2 Replication

Storage Blow-UpDemands are growing exponentially Replication is costlyErasure coding can help[Goodson et al. DSN 2004] [Aguilera et al. DSN 2005][Cachin and Tessaro DSN 2006][Hendricks et al. SOSP 2007][Dutta et al. DISC 2008][Cadambe et al. NCA 2014]But … within limits3

k-of-n Erasure Codes 4      encode

k-of-n Erasure Codes decode 5       encode  

Motivation for Using Codes 6 Suppose we want to tolerate one failureWith replication With erasure codes

Fault Tolerant Distributed Storage Model7 n servers f can fail (crash) clients (all can fail)   Asynchronous  

Distributed Storage: Space Bounds8 Replication Coding O(Dc) withc concurrent writesO(Df) bits Lower bound(Dmin(f,c)) Best-of-both algorithm O( D  min ( f,c ))

Register Emulation Example: 9     decode  

10 Write S1 S2S3S4

Write 11 Generate timestampS1S2S3S4

Write 12 Generate timestampencode S1 S2 S3 S4

Write 13 Generate timestampencode S1 S2 S3 S4

Write 14 Generate timestampencode S1 S2 S3 S4

Write 15 S1 S2 S3S4

Write 16 Wait for n-f replies S1 S2 S3 S4

Read 17 S1 S2S3S4

18 Read S1 S2 S3S4

19 Read Wait for n-f replies S1 S2 S3 S4

Read decode 20 S1 S2 S3 S4

What about concurrent read and write? 21

22 Write S1 S2 S3 S4

23 Write S1 S2 S3 S4

24 Write Overwrite? S1 S2 S3 S4

25 Write Overwrite? Suppose yes, if TS is bigger S1 S2 S3 S4

26 Write S1 S2 S3S4

27 Write Read S1 S2 S3 S4

28 Write Read S1 S2 S3 S4

29 Write Read No written value can be restored! S1 S2 S3 S4

What About Replication? 30

31 Write Read No problem!

32 S1 S2 S3 S4

33 S1 S2 S3 S4

34 S1 S2 S3 S4

35 S1 S2 S3 S4

36 What can be overwritten? S1 S2 S3 S4

37 Can Yellow be overwritten? S1 S2 S3 S4

38 S1 S2 S3 S4

39 S1 S2 S3 S4

40 S1 S2 S3 S4

41 S1 S2 S3 S4

42 S1 S2 S3 S4

43 S1 S2 S3 S4

44 Read cannot be restored! S1 S2 S3 S4

45 Read cannot be restored! Consistency violation S1 S2 S3 S4

46 What can be overwritten? Nothing! S1 S2 S3 S4

47 S1 S2 S3 S4

48 S1 S2 S3 S4

49 S1 S2 S3 S4

Ok, so the standard algorithm can use loads of storageBut… 50

Inherent 51

Inherent 52 For asynchronous algorithms that store coded data

Distributed Storage: Space Bounds53 Replication Coding O(Dc) withc concurrent writesO(Df) bits Lower bound(Dmin(f,c)) Best-of-both algorithm O( D  min ( f,c ))

But First: More About The ModelBlack-box encoding Arbitrary encoding schemeStorage holds:Coded blocks Unbounded data-independent meta-data 54 encodeindependent of other values

ObservationEvery data bit in the storage can be associated with a unique write operation Given our storage model (black-box encoding)Formal definition in the paper 55

Storage Complexity 56 meta-data do not countStorage is measured in bitscount

TheoremEvery regular lock-free MWMR register emulationTolerating f failuresAllowing c concurrent writes Storing values from domain of size 2DNeeds to store (Dmin(f,c)) bits At some point 57

Proof Steps 58

Proof Steps Pigeonhole: need D=log2|V| Bits associated with some write operation to read a value Dynamically track sets of clients/servers contributing many bits to the storageAdversary AdBlows up the storage, does not allow any write to completeProve the bound for fair adversary 59

60 D D- l +1l/2 l/ 2 Tracking Sets

61 l/ 2 l/ 2 C + (t) Tracking Sets writes that store more than D- l bits at time t D- l +1 D

62 l/ 2 l/ 2 C + (t) F(t) Tracking Sets Servers that store at least l bits at time t writes that store more than D- l bits at time t D- l +1 D

63 l/ 2 l/ 2 C + (t) F(t) Storage Size D- l +1 D

64 l/ 2 l/ 2 C + (t) F(t) Storage Size At least ( D- l +1 )  | C + (t )| bits D- l +1 D

65 l/ 2 l/ 2 C + (t) F(t) Storage Size At least ( D- l +1 )  | C + (t )| bits At least l  bits   D- l +1 D

Adversary AdWe’ll define a particular adversary structureControls scheduling Prevents progressBlows up the storage 66

67 l/ 2 l/ 2 C + (t) Defining Adversary Ad freeze F(t) delay C+(t) F(t) D- l +1 D

Implications of AdF only growsS ervers in F are “frozen”Writes can move out of C+If their blocks are overwritten We will next show that no client can complete a write operation68

Every set of n-f servers must store D bits of some pending write for a write to return Observation 69

Observation 70 Otherwise

Observation 71 Read No value can be restored!

Observation 72 Read No value can be restored! Consistency violation

Lemma With , servers store less than D bits for each writeProof in the paper  |F(t)|73 l

Lemma Proof74 assume v can be readwrite(v) RMW to server S respondstimeif RMW writes l bits to S , S is added to Fand v cannot be restored without servers in Fwrite(v) is in C- l bits are missingotherwise, at least one bit is still missingWith , if , then there are servers from which no value of a pending write can be read Contradiction!

Corollary : Lemma + Observation With Ad, for any time t, if , then no write completes 75 |F(t)| l

76 l/ 2 l/ 2 C + (t) F(t) Storage Size Recalled At least ( D- l +1 )  | C + (t )| bits At least l  bits   D- l +1 D

Theorem Proof Build run r using AdHave c clients invoke writes Three possible cases: at some point in r storage cost l(f+1) bits|C+(t)| = c at some point in r storage cost (D-l+1)c bitsNone of the above by corollary, no write returns we will show this is impossibleBy setting l = D/2, we get (Dmin(f,c)) 77

Ad is Not Fair!Operations on servers in F(t) never take effect from time t onward Operations by clients that remain in C+ from some point onward never take effect 78

Constructing a Fair Run (Sketch) Run r with Ad: assume and |C+(t)| < c No write completes in rBuild r’: kill all the servers in F and clients permanently in C+r’ is fair with at least one correct processBy lock-freedom, some write completes in r’r and r’ are indistinguishable to all correct clients Therefore, some write completes in r 79

Constructing a Fair Run (Sketch) Run r with Ad: assume and |C+(t)| < c No write completes in rTherefore, some write completes in r 80 Contradiction

Constructing a Fair Run (Sketch) Run r with Ad: assume and |C+(t)| < c No write completes in rTherefore, some write completes in r 81 Details in the paper

Theorem Proof Build run r using AdHave c clients invoke writes Three Two possible cases: at some point in r storage cost l(f+1) bits|C+(t)| = c at some point in r storage cost (D-l+1)c bitsNone of the aboveBy setting l = D/2, we get (Dmin(f,c )) 82

Wrap UpWe proved a fundamental limit of coding (D min(f,c)) bits for regular lock-free register Replication is the best solution under high concurrencyWhy not enjoy both worlds?83

Adaptive AlgorithmReplication storage cost: nD Coding with k,n = O(f)storage cost: (c+1)(D/k)nWe combine both approachesStorage cost: min(2nD, (c+1)(D/k)n) = O(Dmin(f,c)) 84

Adaptive AlgorithmReplication storage cost: nD Coding with k,n = O(f)storage cost: (c+1)(D/k)nWe combine both approachesStorage cost: min(2nD, (c+1)(D/k)n) = O(Dmin(f,c)) 85 Details in the paper

Related Work Tomorrow morning [Cadambe , Wang, Lynch]Similar bound Different (incomparable) set of assumptionsDifferent proof technique Future: find unique minimal set of assumptions86