/
I Can’t Believe  It’s I Can’t Believe  It’s

I Can’t Believe It’s - PowerPoint Presentation

reese
reese . @reese
Follow
66 views
Uploaded On 2023-07-22

I Can’t Believe It’s - PPT Presentation

Not Causal Scalable Causal Consistency with No Slowdown Cascades Syed Akbar Mehdi 1 Cody Littley 1 Natacha Crooks 1 Lorenzo Alvisi 14 Nathan Bronson 2 Wyatt Lloyd ID: 1010167

cal bob adatacenter causal bob cal causal adatacenter abe commit read causally writes consistency timestamp delayed t1r mastermaster1b consistent

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "I Can’t Believe It’s" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. I Can’t Believe It’s Not Causal ! Scalable Causal Consistency with No Slowdown CascadesSyed Akbar Mehdi1, Cody Littley1, Natacha Crooks1, Lorenzo Alvisi1,4, Nathan Bronson2, Wyatt Lloyd31UT Austin, 2Facebook, 3USC, 4Cornell University

2. Causal Consistency: Great In TheoryLots of exciting research building scalable causal data-stores, e.g.,Causal ConsistencyEventual ConsistencyStrong ConsistencyHigher Perf.Stronger GuaranteesCOPS [SOSP 11]Bolt-On [SIGMOD 13]Chain Reaction [EuroSys 13]Eiger [NSDI 13]Orbe [SOCC 13]GentleRain [SOCC 14]Cure [ICDCS 16]TARDiS [SIGMOD 16]

3. Causal Consistency: But In Practice …The middle child of consistency models EspressoTAOManhattanReality: Largest web apps use eventual consistency, e.g.,

4. Key Hurdle: Slowdown CascadesImplicit Assumption ofCurrent Causal SystemsReality at Scale

5. Key Hurdle: Slowdown CascadesEnforceConsistencyImplicit Assumption ofCurrent Causal SystemsReality at ScaleSlowdown CascadeWaitWait

6. Datacenter ADatacenter BReplicated and sharded storage for a social network

7. W1Read (W1)W2W3Datacenter ADatacenter BWrites causally ordered as  

8. W1Read (W1)W2W3BufferedBufferedDatacenter ADatacenter BApplied ?W2Applied ?W1Current causal systems enforce consistency as a datastore invariantW2W3

9. W1Read (W1)W2W3BufferedBufferedDatacenter ADatacenter BApplied ?W2Applied ?W1Alice’s advisor unnecessarily waits for Justin Bieber’s update despite not reading itW2W3DelayedSlowdown CascadeW1Slowdown cascades affect all previous causal systems because they enforce consistency inside the data store

10. Slowdown Cascades in Eiger (NSDI ‘13)Replicated write buffers grow arbitrarily because Eiger enforces consistency inside the datastore

11. OCCULTObservable Causal Consistency Using Lossy Timestamps

12. Causal Consistency guarantees that each client observes a monotonically non-decreasing set of updates (including its own) in an order that respects potential causality between operations Observable Causal ConsistencyKey Idea: Don’t implement a causally consistent data storeLet clients observe a causally consistent data store

13. How do clients observe a causally consistent datastore ?Datacenter ADatacenter B

14. MasterSlaveMasterSlaveSlaveMasterWrites accepted only by master shards and then replicated asynchronously in-order to slaves Datacenter ADatacenter B

15. MasterSlave77MasterSlaveSlaveMaster4488Each shard keeps track of a shardstamp which counts the writes it has appliedDatacenter ADatacenter B

16. MasterSlave77MasterSlaveSlaveMaster4488Causal Timestamp: Vector of shardstamps which identifies a global state across all shards432Client 1Client 2Client 3625000Datacenter ADatacenter B

17. 43278MasterSlave7MasterSlaveSlaveMaster4488432Client 1Client 2Client 3625000w(a)a432432Write Protocol: Causal timestamps stored with objects to propagate dependenciesDatacenter ADatacenter B

18. 88MasterSlave7MasterSlaveSlaveMaster4488432Client 1Client 2Client 3625000w(a)432a8Write Protocol: Server shardstamp is incremented and merged into causal timestampsDatacenter ADatacenter B

19. 8MasterSlave7MasterSlaveSlaveMaster4488832Client 1Client 2Client 3625000w(a)832ar(a)Read Protocol: Always safe to read from masterDatacenter ADatacenter B

20. 8MasterSlave7MasterSlaveSlaveMaster4488832Client 1Client 2Client 3625000w(a)832ar(a)832a835Read Protocol: Object’s causal timestamp merged into client’s causal timestampDatacenter ADatacenter B

21. 5838MasterSlave7MasterSlaveSlaveMaster4488832Client 1Client 2Client 3000w(a)832ar(a)w(b)555583bRead Protocol: Causal timestamp merging tracks causal ordering for writes following readsDatacenter ADatacenter B

22. 8558MasterSlave7MasterSlaveSlaveMaster5488832Client 1Client 2Client 3000w(a)832ar(a)w(b)855b855b832aReplication: Like eventual consistency; asynchronous, unordered, writes applied immediatelyDatacenter ADatacenter B

23. a8558MasterSlave7MasterSlaveSlaveMaster5488832Client 1Client 2Client 3000w(a)832ar(a)w(b)855b5855bDelayed!Replication: Slaves increment their shardstamps using causal timestamp of a replicated writeDatacenter ADatacenter B832

24. 832a8558MasterSlave7MasterSlaveSlaveMaster5588832Client 1Client 2Client 3000w(a)832ar(a)w(b)855b855br(b)50  Delayed!Read Protocol: Clients do consistency check when reading from slavesDatacenter ADatacenter B

25. 832a8558MasterSlave7MasterSlaveSlaveMaster5588832Client 1Client 2Client 3000w(a)832ar(a)w(b)855b855br(b)50  855bDelayed!Read Protocol: Clients do consistency check when reading from slavesDatacenter ADatacenter Bb’s dependencies are delayed,but we can read it anyway!

26. 832a8558MasterSlave7MasterSlaveSlaveMaster5588832Client 1Client 2Client 3855w(a)832ar(a)w(b)855b855br(b)r(a)  78Stale Shard !Delayed!Read Protocol: Clients do consistency check when reading from slavesDatacenter ADatacenter B

27. 832a8558MasterSlave7MasterSlaveSlaveMaster5588832Client 1Client 2Client 3855w(a)832ar(a)w(b)855b855br(b)r(a)  78Stale Shard !Delayed!Options:Retry locallyRead from masterRead Protocol: Resolving stale readsDatacenter ADatacenter B

28. Causal Timestamp CompressionWhat happens at scale when number of shards is (say) 100,000 ?4002342387910278Size(Causal Timestamp) == 100,000 ?

29. Causal Timestamp Compression: StrawmanTo compress down to n, conflate shardstamps with same ids modulo n 100089132091000209CompressProblem: False Dependencies

30. Causal Timestamp Compression: StrawmanTo compress down to n, conflate shardstamps with same ids modulo n 100089132091000209CompressProblem: False Dependencies Solution:Use system clock as the next value of shardstamp on a write Decouples shardstamp value from number of writes on each shard

31. Causal Timestamp Compression: StrawmanTo compress from N to n, conflate shardstamps with same ids modulo n 100089132091000209CompressProblem: Modulo arithmetic still conflates unrelated shardstamps

32. Causal Timestamp CompressionInsight: Recent shardstamps more likely to create false dependenciesUse high resolution for recent shardstamps and conflate the rest 400039893880387337233678ShardstampsShard IDsCatch-allshardstamp458934402123*

33. Causal Timestamp CompressionInsight: Recent shardstamps more likely to create false dependenciesUse high resolution for recent shardstamps and conflate the rest 400039893880387337233678458934402123*ShardstampsShard IDsCatch-allshardstamp0.01 % false dependencies with just 4 shardstamps and 16K logical shards

34. Transactions in OCCULTScalable causally consistent general purpose transactions

35. AtomicityRead from a causally consistent snapshotNo concurrent conflicting writesProperties of Transactions

36. Observable atomicityObservably read from a causally consistent snapshotNo concurrent conflicting writesProperties of Transactions

37. Observable AtomicityObservably read from a causally consistent snapshotNo concurrent conflicting writesNo centralized timestamp authorities (e.g. per-datacenter)Transactions ordered using causal timestampsProperties of TransactionsProperties of Protocol

38. Observable AtomicityObservably read from a causally consistent snapshotNo concurrent conflicting writesNo centralized timestamp authority (e.g. per-datacenter)Transactions ordered using causal timestampsTransaction commit latency is independent of number of replicasProperties of TransactionsProperties of Protocol

39. Observable AtomicityObservably read from causally consistent snapshotNo concurrent conflicting writesRead PhaseBuffer writes at clientValidation PhaseClient validates A, B and C using causal timestampsCommit PhaseBuffered writes committed in an observably atomic way Properties of TransactionsThree Phase Protocol

40. c = [Cal]a = []MasterMaster0b = [Bob]a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave1111Alice and her advisor are managing lists of students for three courses000000Datacenter ADatacenter B010010001001

41. c = [Cal]a = []MasterMaster0b = [Bob]a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave1111Observable atomicity and causally consistent snapshot reads enforced by same mechanism000000Datacenter ADatacenter B010010001001

42. c = [Cal]a = []MasterMaster0b = [Bob]a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave1111000000Datacenter ADatacenter B010010001001Start T1r(a) = []w(a = [Abe]) Transaction T1 : Alice adding Abe to course a

43. c = [Cal]a = [Abe]MasterMaster1b = [Bob]a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave1111100000Datacenter ADatacenter B010010001001Start T1r(a) = []w(a = [Abe])Commit T1100 Transaction T1 : After Commit

44. 1100c = [Cal]a = [Abe]MasterMaster1b = [Bob]a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave1111Transaction T2 : Alice moving Bob from course b to course c1000Datacenter ADatacenter B010010001001Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])100

45. c = [Cal]a = [Abe]MasterMaster1b = [Bob]a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave1111Observable Atomicity: Make writes causally dependent on each other111000Datacenter ADatacenter B010010001001Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])10022Atomicity through causality:Make writes dependent on each other

46. c = [Cal]a = [Abe]MasterMaster1b = [Bob]a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave1111Observable Atomicity: Same commit timestamp makes writes causally dependent on each other122000Datacenter ADatacenter B010010001001Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])10012212222

47. c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave2121Observable Atomicity: Same commit timestamp makes writes causally dependent on each other122000Datacenter ADatacenter B122010122001Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100

48. c = [Bob, Cal]122a = [Abe]MasterMaster1b = []a = []0b = [Bob]c = [Cal]MasterSlaveSlaveSlave2121Transaction writes replicate asynchronously122000Datacenter ADatacenter B122010122001Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100c = [Bob, Cal]

49. c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]c = [Bob, Cal]MasterSlaveSlaveSlave2122Transaction writes replicate asynchronously122000Datacenter ADatacenter B122010122122Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100Delayed!Delayed!

50. c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]c = [Bob, Cal]MasterSlaveSlaveSlave2122122000Datacenter ADatacenter B122010122122Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100Delayed!Delayed!Alice’s advisor reads the lists in a transactionStart T3

51. c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]c = [Bob, Cal]MasterSlaveSlaveSlave2122122010Datacenter ADatacenter B122010122122Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100Delayed!Delayed!Alice’s advisor reads the lists in a transactionStart T3r(b) = [Bob]

52. c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]c = [Bob, Cal]MasterSlaveSlaveSlave2122122010Datacenter ADatacenter B122010122122Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100Delayed!Delayed!T3 Read SetTransactions maintain a Read Set to validate atomicity and causal snapshot reads 0101b = [Bob]Start T3r(b) = [Bob]

53. c = [Bob, Cal]2c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]MasterSlaveSlaveSlave2122122122Datacenter ADatacenter B122010122122Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100Start T3r(b) = [Bob]r(c) = [Bob,Cal]Delayed!Delayed!T3 Read SetTransactions maintain a Read Set to validate atomicity and read from causal snapshotb = [Bob]1010c = [Bob, Cal]122

54. 12c = [Bob, Cal]2c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]MasterSlaveSlaveSlave212122122Datacenter ADatacenter B122010122122Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100Start T3r(b) = [Bob]r(c) = [Bob,Cal]Delayed!Delayed!T3 Read SetValidation failure: c knows more writes from grey shard than applied at the time b was read b = [Bob]1010c = [Bob, Cal]1222

55. c = [Bob, Cal]2c = [Bob, Cal]a = [Abe]MasterMaster1b = []a = []0b = [Bob]MasterSlaveSlaveSlave212122122Datacenter ADatacenter B122010122122Start T1r(a) = []w(a = [Abe])Commit T1Start T2r(b) = [Bob]r(c) = [Cal]w(b = [])w(c = [Bob, Cal])Commit T2100Start T3r(b) = [Bob]r(c) = [Bob,Cal]r(a) = []Delayed!Delayed!T3 Read SetOrdering Violation: Detected in the usual way. Red Shard is stale !b = [Bob]1010c = [Bob, Cal]1222

56. Observable AtomicityObservably read from causally consistent snapshotNo concurrent conflicting writesRead PhaseBuffer writes at clientValidation PhaseClient validates A, B and C using causal timestampsCommit PhaseBuffered writes committed in an observably atomic way Properties of TransactionsThree Phase Protocol

57. Observable AtomicityObservably read from causally consistent snapshotNo concurrent conflicting writesRead PhaseBuffer writes at clientValidation PhaseClient validates A, B and C using causal timestampsCommit PhaseBuffered writes committed in an observably atomic way Properties of TransactionsThree Phase Protocol2. Validation PhaseValidate Read Set to verify A and BValidate Overwrite Set to verify C

58. Evaluation

59. Evaluation SetupOccult implemented by modifying Redis Cluster (baseline)Evaluated on CloudLabTwo datacenters in WI and SC20 server machines (4 server processes per machine)16K logical shards YCSB used as the benchmarkFor graphs shown here read-heavy (95% reads) workload with zipfian distribution

60. Evaluation SetupOccult implemented by modifying Redis Cluster (baseline)Evaluated on CloudLabTwo datacenters in WI and SC20 server machines (4 server processes per machine)16K logical shards YCSB used as the benchmarkFor graphs shown here read-heavy (95% reads) workload with zipfian distributionWe show cost of providing consistency guarantees

61. Goodput Comparison

62. Goodput Comparison4 shardstamps per causal timestamp 8.7%31%39.6%

63. Effect of slow nodes on Occult Latency

64. ConclusionsEnforcing causal consistency in the data store is vulnerable to slowdown cascadesSufficient to ensure that clients observe causal consistency:Use lossy timestamps to provide the guaranteeAvoid slowdown cascadesObservable enforcement can be extended to causally consistent transactionsMake writes causally dependent on each other to observe atomicityAlso avoids slowdown cascades

Related Contents


Next Show more