Peter Alvaro Neil Conway Joseph M Hellerstein David Maier UC Berkeley Portland State D istributed systems are hard Asynchrony Partial Failure A synchrony isnt that hard ID: 464671
Download Presentation The PPT/PDF document "Blazes: coordination analysis for distri..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Blazes: coordination analysis for distributed program
Peter Alvaro
,
Neil Conway, Joseph M.
Hellerstein
David Maier
UC Berkeley Portland StateSlide2
Distributed systems are hard
Asynchrony
Partial FailureSlide3
Asynchrony isn’t that hard
Logical timestamps
Deterministic interleaving
Ameloriation
:Slide4
Partial failure isn’t that hard
Replication
Replay
Ameloriation
:Slide5
Asynchrony *
partial failure
is hard
2
Logical timestampsDeterministic interleaving
Replication
ReplaySlide6
asynchrony *
partial failure
is hard
2
Replication
Replay
Today:
Consistency criteria for fault-tolerant distributed systems
Blazes
: analysis and enforcementSlide7
This talk is all setup
Frame of mind:
Dataflow: a model of distributed computation
Anomalies: what can go wrong?
Remediation strategies
Component propertiesDelivery mechanisms
Framework:
Blazes –
coordination analysis and synthesisSlide8
Little boxes: the dataflow model
Generalization of distributed
services
Components
interact via asynchronous calls (streams
)Slide9
Components
Input interfaces
Output interfaceSlide10
Streams
Nondeterministic orderSlide11
Example: a join operator
R
S
TSlide12
Example: a key/value store
put
get
responseSlide13
Example: a pub/sub service
publish
subscribe
deliverSlide14
Logical dataflow
“Software architecture”
Data source
client
Service X
filter
cache
c
a
bSlide15
Dataflow is compositional
Components are recursively defined
Data source
client
Service X
filter
aggregatorSlide16
Dataflow exhibits self-similaritySlide17
Dataflow exhibits self-similarity
DB
HDFS
Hadoop
Index
Combine
Static
HTTP
App1
App2
Buy
Content
User
requests
App1
answers
App2
answersSlide18
Physical dataflowSlide19
Physical dataflow
Data source
client
Service X
filter
aggregator
c
a
bSlide20
Physical dataflow
Data source
Service X
filter
aggregator
client
“System architecture”Slide21
What could go wrong?Slide22
Cross-run nondeterminism
Data source
client
Service X
filter
aggregator
c
a
b
Run 1
Nondeterministic replaysSlide23
Cross-run nondeterminism
Data source
client
Service X
filter
aggregator
c
a
b
Nondeterministic replays
Run 2Slide24
Cross-instance nondeterminism
Data source
Service X
client
Transient replica disagreementSlide25
Divergence
Data source
Service X
client
Permanent replica disagreementSlide26
Hazards
Data source
client
Service X
filter
aggregator
c
a
b
Order
Contents?Slide27
Preventing the anomalies
Understand component semantics
(And disallow certain compositions)Slide28
Component properties
Convergence
Component replicas receiving the same messages reach the same state
Rules out
divergenceSlide29
Insert
Read
Convergent
d
ata structure
(e.g., Set CRDT)
Convergence
Insert
Read
Commutativity
Associativity
Idempotence
Reordering
Batching
Retry/duplication
Tolerant toSlide30
Convergence isn’t compositional
Data source
client
Convergent
(identical input contents
identical state)Slide31
Component properties
Convergence
Component replicas receiving the same messages reach the same state
Rules out
divergenceConfluenceOutput streams have deterministic contents
Rules out all stream anomaliesConfluent
convergentSlide32
Confluence
o
utput set = f(input set)
{ }
{ }
=Slide33
Confluence is compositional
o
utput set = f
g(input set) Slide34
Preventing the anomalies
Understand component semantics
(And disallow certain compositions)
Constrain message delivery orders
OrderingSlide35
Ordering – global coordination
Deterministic
outputs
Order-sensitiveSlide36
Ordering – global coordination
Data source
client
The first principle of successful scalability
is to
batter the consistency mechanisms
down
to a minimum.
– James Hamilton Slide37
Preventing the anomalies
Understand component semantics
(And disallow certain compositions)
Constrain message delivery orders
OrderingBarriers and sealingSlide38
Barriers – local coordination
Deterministic
outputs
Data source
client
Order-sensitiveSlide39
Barriers – local coordination
Data source
clientSlide40
Sealing – continuous barriers
Do partitions of (infinite) input streams “end”?
Can components produce deterministic results given “complete” input partitions?
Sealing:
partition barriers for infinite streamsSlide41
Sealing – continuous barriers
Finite partitions of infinite inputs are common
…in distributed systems
SessionsTransactionsEpochs / views
…and applicationsAuctionsChatsShopping cartsSlide42
Blazes:
c
onsistency analysis
+
coordination selectionSlide43
Blazes:
Mode 1: Grey boxesSlide44
Grey boxes
Example:
pub/sub
x = publishy = subscribez = deliver
x
y
z
Deterministic
b
ut unordered
Severity
Label
Confluent
Stateless
1
CR
X
X
2
CW
X
3
OR
gate
X
4
OW
gate
x->z
:
CW
y->z
:
CW
TSlide45
Grey boxes
Example:
key/value store
x = put; y = get; z = response
x
y
z
Deterministic
b
ut unordered
Severity
Label
Confluent
Stateless
1
CR
X
X
2
CW
X
3
OR
gate
X
4
OW
gate
x->z
:
OW
key
y->z
: OR
TSlide46
Label propagation – confluent composition
CW
CR
CR
CR
CR
Deterministic
outputs
CWSlide47
Label propagation – unsafe composition
OW
CR
CR
CR
CR
Tainted
outputs
Interposition
pointSlide48
Label propagation – sealing
OW
key
CR
CR
CR
CR
Deterministic
outputs
OW
key
Seal(key=x)
Seal(key=x)Slide49
Blazes:
Mode 1: White boxesSlide50
white boxes
module
KVS
state
do interface input,
:put
,
[
:key
,
:
val
]
interface input,
:get
,
[:ident, :key] interface output, :response, [:response_id
, :key,
:
val
]
table
:log
,
[
:key
,
:
val
]
end
bloom
do
log
<+
put
log
<-
(put
*
log)
.
rights(
:key
=>
:key
)
response
<=
(log
*
get)
.
pairs(
:key
=>
:key
)
do
|
s,l
|
[
l
.
ident
,
s
.
key
,
s
.
val
]
end
end
end
p
ut
response:
OW
key
g
et response:
OR
key
Negation (
order sensitive)
Partitioned by
:keySlide51
white boxes
module
PubSub
state
do interface input,
:
publish
,
[
:key
,
:
val
]
interface input,
:subscribe
, [:ident, :key] interface output, :response,
[:response_id
,
:key
,
:
val
]
table
:log
,
[
:key
,
:
val
]
table :
sub_log
, [:
ident
, :key]
end
bloom
do
log
<=
publish
sub_log <= subscribe
response
<=
(log
*
sub_log
)
.
pairs(
:key
=>
:key
)
do
|
s,l
|
[
l
.
ident
,
s
.
key
,
s
.
val
]
end
end
end
publish
response:
CW
subscribe response:
CRSlide52
The Blazes frame of mind:
Asynchronous dataflow model
Focus on consistency of
data in motion
Component semanticsDelivery mechanisms and costsAutomatic, minimal coordinationSlide53
Queries?Slide54