/
Consensus  I FLP  Impossibility, Consensus  I FLP  Impossibility,

Consensus I FLP Impossibility, - PowerPoint Presentation

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
369 views
Uploaded On 2020-01-19

Consensus I FLP Impossibility, - PPT Presentation

Consensus I FLP Impossibility Paxos COS 418 Distributed Systems Lecture 7 Michael Freedman 2 Recall our 2PC commit problem C TC go TC A B prepare A B P yes ID: 773252

proposal paxos coordinator acceptors paxos proposal acceptors coordinator cheese consensus time transaction phase leader process system accept chosen majority

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Consensus I FLP Impossibility," is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Consensus IFLP Impossibility, Paxos COS 418: Distributed Systems Lecture 7 Michael Freedman

2Recall our 2PC commit problem C  TC: “go!” TC  A, B: “prepare!”A, B  P: “yes” or “no”TC  A, B: “commit!” or “abort!” Client C Transaction Coordinator TC Bank A B

3Recall our 2PC commit problem Who acts as TC? Which server(s) own the account of A? B? Who takes over if TC fails? What about if A or B fail? Client CTransaction Coordinator TCBank A B

4Doing failover “correctly” isn’t easy Transaction Coordinator TC Which node takes over as backup?

5Doing failover “correctly” isn’t easy Transaction Coordinator TC Okay, so specify some ordering (manually, using some identifier) 1 2 3

6Doing failover “correctly” isn’t easy Transaction Coordinator TC But who determines if 1 failed? 1 2 3

7Doing failover “correctly” isn’t easy Transaction Coordinator TC Easy, right ? Just ping and timeout! 1 2 3

8Doing failover “correctly” isn’t easy Transaction Coordinator TC Is the server or the network actually dead/slow? 1 1 2 ✘

9What can go wrong? Transaction Coordinator TC Two nodes think they are TC: “Split brain” scenario 1 1

10What can go wrong? Transaction Coordinator TC Two nodes think they are TC: “Split brain” scenario 1 1

11What can go wrong? Transaction Coordinator TC Safety invariant: Only 1 node is TC at any single time 1 Another problem: A and B need to know (and agree upon) who the TC is…

ConsensusDefinition: A general agreement about something A n idea or opinion that is shared by all the people in a groupOrigin: Latin, from consentire  12

Given a set of processors, each with an initial value: Termination: All non-faulty processes eventually decide on a valueAgreement: All processes that decide do so on the same value Validity: The value that has been decided must have proposed by some process13 Consensus

Group of servers attempting: Make sure all servers in group receive the same updates in the same order as each other Maintain own lists (views) on who is a current member of the group, and update lists when somebody leaves/fails Elect a leader in group, and inform everybodyEnsure mutually exclusive (one process at a time only) access to a critical resource like a file14 Consensus used in systems

Network model:Synchronous (time-bounded delay) or asynchronous (arbitrary delay) Reliable or unreliable communication Unicast or multicast communication Node failures: Fail-stop (correct/dead) or Byzantine (arbitrary) 15Step one: Define your system model

Network model:Synchronous (time-bounded delay) or asynchronous (arbitrary delay) Reliable or unreliable communication Unicast or multicast communicationNode failures:Fail-stop (correct/dead) or Byzantine (arbitrary) 16Step one: Define your system model

… abandon hope, all ye who enter here … 17 Consensus is impossible

No deterministic 1-crash-robust consensus algorithm exists for asynchronous model 18 “FLP” result Holds even for “weak” consensus (i.e., only some process needs to decide, not all )Holds even for only two states: 0 and 1

Initial state of system can end in decision “0” or “1”Consider 5 processes, each in some initial state[ 1,1,0,1,1 ] → 1 [ 1,1,0,1, 0 ] → ? [ 1,1,0,0,0 ] → ? [ 1,1,1,0,0 ] → ? [ 1,0,1,0,0 ] → 0 19Main technical approachMust exist two configurations here which differ in decision

Initial state of system can end in decision “0” or “1”Consider 5 processes, each in some initial state[ 1,1,0,1,1 ] → 1 [ 1,1,0,1,0 ] → 1 [ 1,1,0,0,0 ] → 1[ 1,1,1,0,0 ] → 0 [ 1,0,1,0,0 ] → 0 20Main technical approachAssume decision differs between these two processes

Goal: Consensus holds in face of 1 failure [ 1,1,0,0,0 ] → [ 1,1,1,0,0 ] → 21Main technical approachOne of these configs must be “bi-valent”: Both futures possible 1 | 0 0

Goal: Consensus holds in face of 1 failure [ 1,1,0,0,0 ] → [ 1,1,1,0,0 ] →Key result: All bi-valent states can remain in bi-valent states after performing some work22Main technical approachOne of these configs must be “bi-valent”: Both futures possible 1 0 | 1

23You won’t believe this one trick! System thinks process p crashes, adapts to it… But then p recovers and q crashes…Needs to wait for p to rejoin, because can only handle 1 failure, which takes time for system to adapt …… repeat ad infinitum …

But remember“Impossible” in the formal sense, i.e., “there does not exist”Even though such situations are extremely unlikely … Circumventing FLP Impossibility Probabilistically Randomization Partial Synchrony (e.g., “failure detectors”)24All is not lost…

Why should you care?25 Werner Vogels , Amazon CTO Job openings in m y groupWhat kind of things am I looking for in you? “You know your distributed systems theory : You know about logical time, snapshots, stability, message ordering, but also acid and multi-level transactions. You have heard about the FLP impossibility argument. You know why failure detectors can solve it (but you do not have to remember which one diamond-w was). You have at least once tried to understand Paxos by reading the original paper.”

Paxos Safety Only a single value is chosen Only a proposed value can be chosenOnly chosen values are learned by processes Liveness ***Some proposed value eventually chosen if fewer than half of processes failIf value is chosen, a process eventually learns it26

Roles of a ProcessThree conceptual rolesProposers propose values Acceptors accept values, where chosen if majority acceptLearners learn the outcome (chosen value)In reality, a process can play any/all roles27

Strawman3 proposers, 1 acceptorAcceptor accepts first value received No liveness on failure 3 proposals, 3 acceptors Accept first value received, acceptors choose common value known by majority But no such majority is guaranteed28

PaxosEach acceptor accepts multiple proposals Hopefully one of multiple accepted proposals will have a majority vote (and we determine that) If not, rinse and repeat (more on this) How do we select among multiple proposals? Ordering: proposal is tuple (proposal #, value) = (n, v)Proposal # strictly increasing, globally uniqueGlobally unique? Trick: set low-order bits to proposer’s ID29

Paxos Protocol OverviewProposers: Choose a proposal number n Ask acceptors if any accepted proposals with na < nIf existing proposal va returned, propose same value (n, va)Otherwise, propose own value (n, v)Note altruism: goal is to reach consensus, not “win”Accepters try to accept value with highest proposal nLearners are passive and wait for the outcome30

Paxos Phase 1Proposer: Choose proposal number n, send <prepare, n> to acceptors Acceptors: If n > nhnh = n ← promise not to accept any new proposals n’ < nIf no prior proposal acceptedReply < promise, n, Ø >Else Reply < promise, n, (na , va) >Else Reply < prepare-failed >31

Paxos Phase 2Proposer:If receive promise from majority of acceptors, Determine v a returned with highest na, if existsSend <accept, (n, va || v)> to acceptorsAcceptors:Upon receiving (n, v), if n ≥ nh,Accept proposal and notify learner(s)na = nh = nva = v32

Paxos Phase 3Learners need to know which value chosen Approach #1 Each acceptor notifies all learners More expensive Approach #2Elect a “distinguished learner”Acceptors notify elected learner, which informs othersFailure-prone33

34Paxos: Well-behaved Run < accepted, (1 ,v 1)> 1 2 n . . . 1 1 2 n . . . <prepare, 1> 1 < promise, 1> 1 2 n . . . <accept , (1,v 1 )> d ecide v 1

Intuition: if proposal with value v decided, then every higher-numbered proposal issued by any proposer has value v.35 Paxos is safe M ajority of acceptors accept (n, v): v is decided Next prepare request with proposal n+1

Race condition leads to liveness problemC ompletes phase 1 with proposal n0 36 Starts and completes phase 1 with proposal n1 > n0 P erforms phase 2, acceptors reject R estarts and completes phase 1 with proposal n2 > n1 Process 0 Process 1 Performs phase 2, acceptors reject … can go on indefinitely …

Paxos with leader election Simplify model with each process playing all three roles If elected proposer can communicate with a majority, protocol guarantees livenessPaxos can tolerate failures f < N / 237

38Using Paxos in system Leader election to decide transaction coordinator 1 2 3 L

L 39 Using Paxos in system New leader election protocol 2 3 Still have split-brain scenario! L new

Tells mythical story of Greek island of Paxos with “legislators” and “ current law” passed through parliamentary voting protocolMisunderstood paper: submitted 1990, published 1998Lamport won the Turing Award in 201340

41The Paxos story… As Paxos prospered, legislators became very busy. Parliament could no longer handle all details of government, so a bureaucracy was established. Instead of passing a decree to declare whether each lot of cheese was fit for sale, Parliament passed a decree appointing a cheese inspector to make those decisions. Cheese inspector ≈ leader using quorum-based voting protocol

42The Paxos story… Parliament passed a decree making ∆̆ ικστρα the first cheese inspector. After some months, merchants complained that ∆̆ικστρα was too strict and was rejecting perfectly good cheese. Parliament then replaced him by passing the decree 1375: Γωυδα is the new cheese inspectorBut ∆̆ικστρα did not pay close attention to what Parliament did, so he did not learn of this decree right away. There was a period of confusion in the cheese market when both ∆ῐκστρα and Γωυδα were inspecting cheese and making conflicting decisions. Split-brain!

43The Paxos story… To prevent such confusion, the Paxons had to guarantee that a position could be held by at most one bureaucrat at any time. To do this, a president included as part of each decree the time and date when it was proposed. A decree making ∆ῐκστρα the cheese inspector might read 2716: 8:30 15 Jan 72 – ∆ῐκστρα is cheese inspector for 3 months. Leader gets a lease!

44The Paxos story… A bureaucrat needed to tell time to determine if he currently held a post. Mechanical clocks were unknown on Paxos, but Paxons could tell time accurately to within 15 minutes by the position of the sun or the stars. If ∆̆ικστρα’s term began at 8:30, he would not start inspecting cheese until his celestial observations indicated that it was 8:45. Handle clock skew: Lease doesn’t end until expiry + max skew

L 45 Solving Split Brain New leader election protocol 2 3 L new Solution If L isn’t part of majority electing L new L new waits until L’s lease expires before accepting new ops

Next lecture: Monday Other consensus protocols with group membership + leader election at core Viewstamped Replication RAFT (assignment 3 & 4)46