a journey from the simple to the optimal Faisal Nawab In collaboration with Divy Agrawal Amr El Abbadi Vaibhav Arora Hatem Mahmoud Alex Pucher ID: 530869
Download Presentation The PPT/PDF document "Geo-replication" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Geo-replication
a journey from the simple to the optimal
Faisal NawabIn collaboration with:Divy Agrawal, Amr El Abbadi, Vaibhav Arora, Hatem Mahmoud, Alex Pucher (UC Santa Barbara)Aaron Elmore (U. of Chicago) Stacy Patterson (Rensselaer Polytechnic Institute) Ken Salem (U. of Waterloo) Slide2
Geo-replication
A
BCDatacenter California
A
B
C
Datacenter Virginia
XSlide3Slide4
Geo-replication has challenges
Wide-area latency
Coordination is expensive
Consistency guarantees
Transactions supportSlide5
~10 years ago – NoSQL
No
transactions supportWeaker consistency guaranteesTransactions and guarantees are often neededDeveloper builds own (transactional) solutionError-prone processReinventing the wheelBUTSlide6
~4 years ago – bringing back transactions
We want our transactions!
Megastore [CIDR’2011], Spanner [OSDI’12], MDCC [EuroSys’13]A journey from the simple to the optimal20122015Paxos-CP [VLDB’12]Replicated Commit [VLDB’13]Message Futures [CIDR’13]Helios [SIGMOD’15]Chariots [EDBT’15]Slide7
TransactionsSlide8
Transactions
A collection of read and write operations
Atomicity, Consistency, Isolation, DurabilityAbstract as a set of reads and writesGuarantee the “illusion” of a serial executionRemaining = Read (T)If (Remaining>N) Write(X, Remaining-N) Process paymentElse failBuy N ticketsRead (T)Write (T, new value)Purchase (N)Slide9
Transactions
Why “Guarantee the illusion of a serial execution”?
Remaining = Read (#Tickets)Write (#Tickets, 2)Process paymentBuy 2 ticketsRemaining = Read (#Tickets)Write (#Tickets, 1)Process paymentBuy 3 tickets41Slide10
Spanner [OSDI’12]Slide11
Spanner [OSDI’12]
Google’s solution for geo-replication
Commit protocol (2PC/Paxos)Each partition has a leaderTwo-Phase Commit (2PC) across partition leadersPaxos to replicate each step of 2PCA
B
C
A
B
CSlide12
Transaction latency
A
BDatacenter CaliforniaCA
B
Datacenter Virginia
C
Read requests
2PC message
Paxos
message
Read (2)
Send prepare (1)
Replicate (2)
Receive prepare (1)
Commit (0)
Replicate (2)Slide13
Effective geo-replication
Spanner proved an effective geo-scale model
High throughputFault-toleranceSerializable transactionsIt also illuminated a challenge of geo-replicationWide-area latencyLeads to high transaction latencySlide14
[VLDB’13]
wide-area latency
awarenessSlide15
Wide-area latency awareness
Inter-datacenter latency is much higher than intra-datacenter latency
Intra-datacenter latency: ~ 1-2 millisecondsInter-datacenter latency: 10s to 100s millisecondsInter-datacenter rounds of communication are expensiveAvoid them!Replicated Commit [VLDB’13]Majority voting algorithm for the geo-replication frameworkThe datacenter is the machineSlide16
Replicated Commit [VLDB’13]
A
BDatacenter CaliforniaCA
B
Datacenter Virginia
C
Read requests
Voting messages
Locking messages
Read (2)
Voting request (1)
Locks (0)
Voting (1)
Commit (0)Slide17
Latency performance
Latency depends on the network topology
Read operationsReplicated commit: majority2PC/Paxos: read from leaderCommit phaseReplicated commit: 1 round to majoritySpanner: 1 round to leaders + majority round from leadersSlide18
Performance
337
167
403
Replicated Commit
2PC/
Paxos
Five data centers
Data into 3 partitions
YCSB clients at each data center
Show average commit latency (
ms
)
92Slide19
[CIDR’2013]
“
Can we break theRTT barrier?”Slide20
Decoupling consistency and fault-tolerance
Decouple
consistency and fault-toleranceProtocols to ensure consistency onlyAugment with fault-tolerance laterMessage Futures [CIDR’13]A causally ordered log is leveragedSlide21
Message Futures
A
BSimple case: Ping ponging log propagationsCommit rule: (1) wait until next log is received (2) detect conflicts with coming log. txnCommit
Latency less than RTTSlide22
Message Futures
A
BGeneral case: Continuous log propagationsCommit rule: (1) wait until previous log transmission is acknowledged (2) detect conflicts with coming logtxnCommit
For arbitrary log propagation
Control relative performance by controlling propagation ratesSlide23
[SIGMOD’15]
“
Is there a lower-bound on transaction latency?”Slide24
A
B
Events can affect outcome of T1Events can be affected by T1
Transaction T2
Commit latency of A + Commit latency of B must be greater than or equal the Round-Trip Time between them
T1 requests to commit
T1 commits
T1 latency
T2 latencySlide25
Optimal latency
Lower bound: Latency(A) + Latency(B) > RTT(A,B)
Minimize (sum of latencies) Subject to (1) Latency(A) + Latency (B) > RTT(A,B), for all A,B (2) Latency (A) >= 0, for all AABC
30
20
40Slide26
Optimal latency
A
BC302040ProtocolLatency(A)Latency(B)Latency(C)Average
Leader-based
(Leader A)
0
30
20
16.67
Leader-based (Leader C)
20
40
0
20
Majority
20
30
2023.33
Optimal525
1515Slide27
[SIGMOD’15]
Achieving the
lower-boundSlide28
insight from the lower-bound
A
BEvents can affect outcome of T1Events can be affected by T1
Transaction T2
T1 requests to commit
T1 commits
HeliosSlide29
A
B
Helios commit protocolRTT=16Latency= 10Latency= 6
Time=5
Time=15
Time=7
Time=13Slide30
Systems are
more
than transaction latency numbersThinking outside the box -- Cross-layer solutionWide-area latency awarenessLower-bound on transaction latency