Measuring and Understanding Consistency at Facebook Haonan Lu Kaushik Veeraraghavan Philippe Ajoux Jim Hunt Yee Jiun Song Wendy Tobagus ID: 1020713
Download Presentation The PPT/PDF document "Existential Consistency:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Existential Consistency:Measuring and Understanding Consistency at FacebookHaonan Lu*†, Kaushik Veeraraghavan†, Philippe Ajoux†, Jim Hunt†, Yee Jiun Song†, Wendy Tobagus†, Sanjeev Kumar†, Wyatt Lloyd*† *University of Southern California, †Facebook1
2. 2
3. 3
4. 4ConsistencyPerformance
5. Fundamental Tension5ConsistencyPerformanceEliminates anomalies (Oculus example)Lower latencyFirst study of consistency in a large-scale, production system – Facebook TAODifficult to quantifySimple to quantifyMakes systems easier to programHigher throughput
6. Anomaly: Unexpected Behavior6Post Example“Hey, I mentioned you in a post”New post “@Wyatt, you should check out this game!”Read friend’s timelineOld posts
7. Anomaly: Unexpected Behavior7Oculus Example1. “Mine! yeah~ lucky!”1. “I wouldn’t mind…”1. “I wouldn’t mind…”2. “Mine! yeah~ lucky!”
8. Does Facebook have consistency anomalies?How many?What type?8
9. TAO: Eventually Consistent Cache9ABCMnewpostdonereadVulnerability window: time during asynchronous replication when anomalies can happenvalueold post
10. Quantifying AnomaliesHow often do anomalies occur?Collect trace of requests to TAOWhat consistency would prevent them?Run anomaly checkers on the trace10
11. Trace CollectionCollect trace on web serversChallenges in tracing production systemVolume of requestsTime skew between web serversMissing requests11
12. Challenge: Volume of Requests12Billions of requests per second [ATC ’13]Too many to logSample on objectsObject: vertex in social graphLog all requests to objects in sampleSufficient for local consistency models
13. Local Property Enables Sampling13“… the system as a whole satisfies P whenever each individual object satisfies P.”[1] Local LinearizabilityPer-Object SequentialRead-After-WriteLocal consistency models can bechecked on a per object basis[1] M. P. Herlihy and J. M. Wing “Linearizability: A Correctness Condition for Concurrent Objects.” ACM TOPLAS, 1990
14. Challenge: Time Skew Time skew across web servers99.9 percentile for 1 week: 35msAdd time skew to request’s durationMore overlapped requestsEliminates false positives14
15. Start timeFinish timeRead or writeValue: match read with write Logging Details15Logged information:Start timeFinish timeRead or writeValue: match read with write Sampling rate: 1 out of 1 million objects ~ 100% of requests to sampled objectsPost(new)Determine real timeordering of requests
16. Trace Statistics1612 days (8/20 – 8/31)17 million objects3 billion requests
17. Check Trace for Anomalies17Linearizability checkerPaxos providesPer-Object Sequential checkerPNUTS providesRead-After-Write checkerTAO provides within a cluster
18. Linearizability18Strongest non-transactional consistencyReal-time constraintPost exampleTotal order constraintOculus example!Should return “new”Post (new)HaonanHaonanPost (old)WyattRead (old)
19. Linearizability Checker19Graph captures state transitionsVertex: write operationsEdge: real-time orderMerge read with its writeCaptures state transitions seen by usersAnomaly if merge causes a cycleCycle indicates user’s view ≠ system view
20. Linearizability Checker20Captures real-time constraintRead should return new post insteadPost(new)Post (old)Read (old)AnomalyShould return new postPost (new)HaonanHaonanWyattPost (old)Read (old)
21. 21More Complex Cases2 Anomalies http://tinyurl.com/sosp15-demow(0)r(1)w(1)w(2)w(3)r(2)r(3)r(3)r(2)r(1)
22. Result OverviewLinearizabilityPer-Object SequentialRead-After-WriteBounds on non-local consistency models22Anomalies found for all consistency models– adopting them would have benefits
23. Linearizability Results235 anomalies per million readsPrevented by Paxos-based implementationUpper bound on TAO anomaliesStrongest consistency we checkedTAO is highly consistent
24. Linearizability ResultsReal-Time Constraint Violations244 per million readsABMPost (new)ReadReplica A:Master M:Replica B:Post (new)startsPost (new)finishesVulnerable!Read (old)
25. 251 per million reads ABMReplica A:Master M:Replica B:H startsWHVulnerable!Comment(H)Comment(W)H finishesW startsW finishesRead (W)Read (H)Linearizability ResultsTotal Order Constraint Violations
26. Per-Object Sequential Results261 anomaly per million readsTotal order constraintUser session constraint (1 per 10 million)Users should see their writesABMPost(new)ReadOld
27. Infer Bounds on Causal27Linearizability5 per million readsCausalPer-Object Sequential 1 per million reads≤ 5 per million reads ≥ 1 per million readsSubset of causal anomaliesSuperset of causal anomalies
28. Lower Bounds on Transactions28Linearizability5 per million readsCausalPer-Object Sequential1 per million readsStrict SerializabilityCausal with TransactionsFuture research shouldprovide transactions> 1 per million reads> 5 per million reads
29. Real-Time Consistency MonitorCheckers cannot run in real-timeΦ-consistency Measure convergence of replicasA real-time health monitorAlarms when a replica falls behind29
30. Conclusion30Benefits of consistency are hard to quantifyFirst study of a large-scale production systemMeasure Facebook’s TAO systemCollect trace and run anomaly checkersReal-world challengesResultsTAO is highly consistentBenefits of adopting stronger consistency existResearch should provide transactions