Ali Ghodsi aligcsberkeleyedu CAP conjecture reminder Can only have two of Consistency Availability Partitiontolerance Examples Databases 2PC centralized algo C amp A Distributed databases majority protocols C amp P ID: 589584
Download Presentation The PPT/PDF document "Life after CAP" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Life after CAP
Ali Ghodsi
alig@cs.berkeley.eduSlide2
CAP conjecture [reminder]
Can only have
two
of:
Consistency
Availability
Partition-tolerance
Examples
Databases, 2PC, centralized algo (C & A)
Distributed databases, majority protocols (C & P)
DNS, Bayou (A & P)Slide3
CAP theorem
Formalization by Gilbert & Lynch
What does
impossible
mean?
There exist an execution which violates one of CAPnot possible to guarantee that an algorithm has all three at all timesShard data with different CAP tradeoffsDetect partitions and weaken consistencySlide4
Partition-tolerance & availability
What is
partition-tolerance
?
Consistency and Availability are provided by algo
Partitions are external events (scheduler/oracle)Partition-tolerance is really a failure modelPartition-tolerance equivalent with omissionsIn the CAP theorem
Proof rests on partitions that
never
healDatacenters can guarantee recovery of partitions!Can guarantee that conflict resolution eventually happensSlide5
Availability
In CAP theorem
”Eventually get a response”
Too strong
Availability typically probabilistic, e.g. 99.999%
Too weakAvailability typically with SLOs, e.g. response within 1 secSlide6
How do we ensure consistency
Main technique to be consistent
Quorum principle
Example: Majority quorums
Always write to and read from a majority of nodes
At least one node knows most recent value
WRITE(v)
READ
v
majority(9)=5Slide7
Quorum Principle
Majority Quorum
Pro
: tolerate up to
N/2 -1 crashesCon: Have to read/write N/2
+1 values
Read/write quorums (Dynamo, ZooKeeper, Chain Repl)Read R nodes, Rrite W nodes, s.t. R + W > N (W > N/2)Pro
: adjust performance of reads/writes
Con
: availability can suffer
Maekwa
Quorum
Arrange nodes in a
MxM
grid
Write to
row+col, read cols (always overlap)Pro
: Only need to read/write O( sqrt(N) ) nodesCon: Tolerate at most O(
sqrt(N) ) crashes (reconfiguration)
7
P1
P2
P3
P4
P5
P6
P7
P8
P9Slide8
Probabilistic Quorums
Quorum size
α√
N
, (
α > 1) intersects with probability 1-exp(
α
2
)
Example: N=16 nodes, quorum size 7,
intersects 95%, tolerates
9 failures
Maekwa: N=16 nodes, quorum size 7,
intersects 100%, tolerates 4 failures
Pro
: Small quorums, high fault-tolerance
Con
: Could fail to intersect, N usually large
8Slide9
Quorums and CAP
With quorums we can get
C & P
: partition
can
make quorum unavailableC & A: no-partition ensures availability and atomicityFaced decision when fail to get quorum [brewer’11]Sacrifice availability by waiting for mergerSacrifice atomicity by ignoring the quorum
Can we get CAP for weaker consistency?Slide10
What does atomicity really mean?
Linearization Points
Read ops
appear as if
immediately
happened at all nodes at
time between invocation and response
Write ops
appear as if immediately happened at all nodes at
time between invocation and response
P
3
P
2
W(5)
W(6)
R
P
1
R
invocation
responseSlide11
Definition of Atomicity
Linearization Points
Read ops
appear as if
immediately
happened at all nodes attime between invocation and responseWrite ops appear as if
immediately
happened at all nodes at
time between invocation and response
P
3
P
2
W(5)
W(6)
R:5
P
1
R:6
atomicSlide12
Definition of Atomicity
P
3
P
2
W(5)
W(6)
R:6
P
1
R:6
atomic
R:5
P
3
P
2
W(5)
W(6)
R:6
P
1
not atomicSlide13
Atomicity too strong?
P
3
P
2
W(5)
W(6)
R:6
P
1
R:5
not atomic
Linearization points too strong?
Why not just have R:5 appear atomically right after W(5)?
Lamport: ”If P
2
’s operator phones P
1
and tells her I just read 6”Slide14
Atomicity too strong?
P
3
P
2
W(5)
W(6)
R:6
P
1
R:5
not atomic
sequentially consistent
Sequential consistency
Weaker than atomicity
Sequential consistency removes this ”real-time” requirement
Any global ordering OK as long as they respect local ordering
Does Gilbert’s proof fall apart for sequential consistency?
Causal memory
Weaker than sequential
No need to have global view, each process different view
Local, read/writes immediately return to caller
CAP theorem does not apply to causal memory
P
2
W(1)
P
1
R:0
W(0)
R:1
causally
consistentSlide15
Going really weak
Eventual consistency
When network non-partitioned, all nodes eventually have the same value
I.e. don’t be ”consistent” at all times, but only after partitions heal!
Based on powerful technique:
gossippingPeriodically exchange ”logs” with one random nodeExchange must be constant-sized packetsSet reconciliation, merkle trees, etcUse (clock, node_id) to break ties of events in log
Properties of gossipping
All nodes will have the same value in O(log N) time
No positive-feedback cycles that congest the networkSlide16
BASE
Catch all for any consistency model C’ that enables C’-A-P
Eventual consistency
PRAM consistency
Causal consistency
Main ingredientsStale dataSoft-state (regenerateable state)Approximate answers Slide17
Summary
No need to ensure CAP at all times
Switch between algorithms or satisfy subset at different times
Weaken consistency model
Choose weaker consistency:
Causal memory (relatively strong) work around CAPOnly be consistent when network isn’t partitioned:Eventual consistency (very weak) works around CAPWeaken partition-tolerance
Some environments never partition, e.g. datacenters
Tolerate unavailability in small quorums
Some env. have recovery guarantees (partitions heal within X hours), perform conflict resolutionSlide18
Related Work (ignored in talk)
PRAM consistency (Pipelined RAM)
Weaker than causal and non-blocking
Eventual Linearizability (PODC’10)
Becomes atomic after quiescent periods
Gossipping & set reconciliationLots of related work