Scott Phung Nov 15 2011 Some slides borrowed from Jared 09 Motivation Build Distributed Systems with FaultTolerance Consistency Concurrency Easy programmability Timeline Source A History of the Virtual Synchrony Replication Model 93 ID: 418351
Download Presentation The PPT/PDF document "Virtual Synchrony" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Virtual Synchrony
Scott
Phung
Nov 15, 2011
Some slides borrowed from Jared (‘09)Slide2
Motivation
Build Distributed Systems with:
Fault-Tolerance
Consistency
Concurrency
Easy programmabilitySlide3
Timeline
Source: A History of the Virtual Synchrony Replication Model (‘93)
Year
Event
Author
1975
ARPANET
ARPANET
1978
Time, Clocks, and the
Ordering of Events
in a Distributed System
Lamport
1978, 84, 90
State Machine Replication
Lamport
,
Schneider
1981
Database
serializability
, 2PC, 3PC
Berstein
,
Goodman,
Skeen
1982
Byzantine General’s Problem
Lamport
,
Shostak
, Pease
1983
Impossibility of Distributed Consensus with One Faulty Process
Fischer, Lynch, Paterson
1983+
Virtual Synchrony
Birman
et al
1985
Group Communication primitives,
“process group”
OS construct
Cheriton
,
Deering
,
Zwaenepoel
1990
Paxos
Lamport
Slide4
The Process Group Approach to Reliable Distributed Computing (‘93)
Ken
Birman
Professor, Cornell University
Virtual Synchrony / Isis / Isis
2
Quicksilver
Live ObjectSlide5
Assumptions
Asynchronous communication
Message Passing
Fail-Crash Failure Model
Timeout suspects stopped or slow processes through
Processes considered to have failed
WAN of LANsSlide6
Virtual Synchrony
Distributed execution model that gives the appearance of synchronous execution
Eases program development
will talk more later
Features
Process Groups
Ordered and Concurrent
Message Delivery
Reliable MulticastSlide7
Motivation
Build Distributed Systems with:
Fault-Tolerance
Consistency
Concurrency
Easy programmability
How to achieve Fault-Tolerance, Consistency and Easy Programmability? Process Groups.Slide8
Outline
Problem
Process Groups (Implementation)
Solution
Close Synchrony
Virtual Synchrony
IsisSlide9
Process Groups
Communication
framework that
structures members of a distributed system into
groups:
Provides
an easy development
framework:Slide10
Process Groups
Process Groups provides:
Fault Tolerance
State Machine Replication
Consistency
Membership changes, Message Delivery OrderSlide11
Process Groups Issues
Problems building using Conventional Technologies (UDP, RPC, TCP
):
No reliable multicast (Group Communication)
Membership churn (Group Membership)
Message ordering (Synchronization)
State transfers (Group Membership)
Failure atomicity (Group Membership
)Slide12
No Reliable Multicast
UDP, TCP, Multicast not good enough
What is the correct way to recover?
p
q
r
Ideal
RealitySlide13
Membership Churn
Membership changes are not instant
How to handle failure cases?
p
q
r
Receives new membership
Never sentSlide14
Message Ordering
Lamport’s
Notion of Time: Causality
How to prevent causal messages delivered out of order (Ex 2)?
p
q
r
1
2Slide15
State Transfers
New nodes must get current state
Does not happen
instantly
How do you handle nodes failing/joining?
p
q
rSlide16
Failure Atomicity
Nodes can fail mid-transmit
Some nodes receive message, others do not
Inconsistencies arise!
p
q
r
Ideal
Reality
x
?Slide17
Process Groups Issues Recap
Problems building using Conventional Technologies (UDP, RPC, TCP
):
No reliable multicast (Group Communication)
Membership churn (Group Membership)
Message ordering (Synchronization)
State transfers (Group Membership)
Failure atomicity (Group Membership)
Can we build a system that solves these?Slide18
Outline
Problem
Process Groups (Implementation)
Solution
Close Synchrony
Virtual Synchrony
IsisSlide19
Close Synchrony
Synchronous Execution Model
Multicast delivered
to
all group members as
a single
, reliable instantaneous event.
Solves
a
ll Process Group problems!Slide20
Close Synchrony
Synchronous
execution
Execution moves in lock-step
p
q
r
s
t
u
Ken’s Slides - 2006Slide21
Close Synchrony
Process Group problems solved:
No Reliable Multicast
Multicast is always
reliable
Membership Churn
Membership is always consistent
Message Ordering
Totally ordered message
delivery
State Transfers
State-transfer happens
instantaneously
Failure Atomicity
Multicast is a
single eventSlide22
Close Synchrony
Problem
We don’t have instantaneous events
It is impossible
in the presence of
failures
Expensive (waits for slowest member)
What can we do?Slide23
Asynchronous Execution
p
q
r
s
t
u
Ken’s Slides - 2006Slide24
Virtual Synchrony
Close Synchrony using Asynchronous protocols
Group Communication
Notion of time: Use
Lamport’s
Happens-Before relationship
Causal & Concurrent Ordered Message Delivery (CBCAST)
This causal order matches some equivalent Close Synchronous execution (total order).
Group Membership
Synchronized Membership View Changes
Replicated Group Membership Service sends final word on failures & joins to all membersSlide25
Causal Message Ordering
CBCAST (Casual Atomic Broadcast Primitive)
Asynchronous
,
fast
Causal Order Delivery (within group)
Vector clock, delay of messagesConcurrent messages can be delivered OOOBatch
multiple messages
Most-used primitive in Virtual SynchronySlide26
Total Message Ordering
ABCAST (Atomic Broadcast Primitive)
Synchronous, slow
Total Order Delivery (within a group)
No
message can be delivered to any user until all previous ABCAST messages have been
deliveredSlide27
Distributed Algorithms
How can Process Groups solve Consensus?
From Ken’s Slides - 2006Slide28
Distributed Algorithms
How can Process Groups perform Distributed Snapshots?
From Ken’s Slides - 2006Slide29
Isis
Framework that offers Group communication with Virtual Synchrony
Takes care of group communication, membership changes and failures through a single, event oriented execution model (Virtual Synchrony).
You just concentrate on the member code!Slide30
Isis
Used In:
NYSE, Swiss Stock Exchange
French Air Traffic Control System
US Navy AEGISSlide31
Isis - Weakness
Large Groups - Multicast reply explosion
Isis
2
Group Aggregation, Dr. Multicast
No reduction ability within Groups
Isis2 Group Aggregation
Messages sent are not durable
Isis
2
SafeSend
(
Paxos
Mode)Slide32
Isis2
Group Aggregation
Used if group is
really big
Request,
updates: still via multicast
Response
is aggregated within a tree
Birman: DARPA MRC Kickoff, Washington, Nov 3-4 2011
Level 0
query
a
a
c
a
c
d
b
v
a
v
b
v
c
v
d
Agg
(
v
c
v
d
)
Agg
(
v
a
v
b
)
reply
Example: nodes {
a,b,c,d
} collaborate to perform a querySlide33
Takeaways
Virtual Synchrony Benefits
Group Communication, Membership Changes, State Transfers and Failures in a single event execution model (Close Synchrony)
Key Contributions
Dynamic Group Membership
Integration of Failure detection into communication subsystems
Ordered and Total Message DeliverySlide34
Understanding the Limitations
of Causally and Totally Ordered Communication (‘93)
David
Cheriton
Professor, Stanford
PhD – Waterloo
V Operating System
Dale Skeen
PhD – UC Berkeley, former Cornell Assistant Prof.
Distributed pub/sub communication, 3PC
Co-founded TIBCO,
VitriaSlide35
CATOCS Problems
Causal And Totally Ordered Communication Support
Message delivery is atomic, but not
durable
Incidental ordering
CATOCS is at communication level but consistency requirements are at application state
Violates end-to-end argument.Slide36
Limitations of CATOCS in communication layer
Unrecognized
Causality
Can’t say “
for sure
”
No Semantic OrderingCan’t say the “whole story”
Lack
of serialization
ability
Can’t say “
together
”
Lack of
Efficiency
Gain over
State-level Techniques
Can’t say “
efficiently
”Slide37
Unrecognized Causality
Can’t say “
for sure
”
Causal relationships at semantic level are not recognizable
External or ‘hidden’ communication channel.Slide38
Can’t say “together
”
Serializable
ordering, cannot order a group of messages together
Seems to only provide shared-memory w/lock examples, do other Message Passing systems offer
serializable
ordering?Slide39
Can’t say “whole story”
Semantic ordering are not ensuredSlide40
Can’t say “efficiently
”
No efficiency gain over state-level techniques
False Causality
Not scalable
Overhead of message reordering
Buffering requirements grow quadraticallySlide41
False Causality
What if m2 happened to follow m1, but was not causally related?Slide42
Birman’s Response (‘93)
Ordering is important to guarantee consistency
when
combined
with an Execution model (Virtual Synchrony) produces a system with powerful reliability guarantees.
This point was
completely neglected.
Causal ordering
is cheap and prevents
some
failures.
flow control and congestion handling more important.
Hidden Channels
Rare, mostly in Shared Memory, which you protect with a lock.
No system can say for sure for the example constructed.Slide43
Birman’s Response (‘93)
Semantic
vs
Causal Ordering
Causal
o
rder provides some ordering guarantees. Tag with timestamps or create causal dependency from theoretical price to actual price.
Can Say “efficiently”
Buffering requirements do not grow quadratic, they are usu
.
constant.
VS is efficient, otherwise leave group membership, communication, synchronization to application developer ==> less efficient system
Theoretical Proofs carry little weight in this domain
FLP, yet systems are still built that solve consensus.
3PC, yet most DB systems use 2PC.