Emily Andrews Peter Alvaro Peter Bailis Neil Conway Joseph M Hellerstein William R Marczak UC Berkeley David Maier Portland State University Conventional Programming Distributed Programming ID: 225687
Download Presentation The PPT/PDF document "Disorderly Distributed Programming with ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Disorderly Distributed Programming with Bloom
Emily Andrews, Peter Alvaro, Peter Bailis,Neil Conway, Joseph M. Hellerstein,William R. MarczakUC Berkeley
David MaierPortland State UniversitySlide2
Conventional ProgrammingSlide3
Distributed ProgrammingSlide4
Problem:
Different nodes might perceivedifferent event ordersSlide5
Taking Order For Granted
Data(Ordered)
array of bytesCompute
(Ordered
) sequence of instructions
Writing order-sensitive
programs is
too easy!Slide6
Alternative #1:
Enforce consistent event order at all nodesExtensive literature:Replicated state machinesConsensus, coordinationGroup communication“Strong Consistency”Slide7
Alternative #1:
Enforce consistent event order at all nodesProblems:LatencyAvailabilitySlide8
Alternative #2:
Analyze all event orders,ensure correct behaviorSlide9
Alternative #2:
Analyze all event orders to ensure correct behaviorProblem:That is really,really hardSlide10
Alternative #3:
Write order-independent (“disorderly”) programsSlide11
Alternative #3:
Write order-independent (“disorderly”) programsQuestions:How to write such programs?What can we express in a disorderly way?Slide12
Disorderly Programming
Program analysis: CALMWhere is order needed? And why?Language design: BloomOrder-independent by defaultOrdering is possible but explicitMixing order and disorder: BlazesOrder synthesis and optimization
Algebraic programmingSlide13
CALM:
ConsistencyAsLogicalMonotonicitySlide14
History
Roots in UCB database research (~2005)High-level, declarative languages for network protocols & distributed systems“Small programs for large clusters” (BOOM)Distributed programming with logicState: sets (relations)Computation: deductive rules over setsSQL, Datalog, etc.Slide15
Observation:
Much of Datalog isorder-independent.Slide16
Monotonic Logic
As input set grows, output set does not shrink“Mistake-free”Order independente.g., map, filter, join, union, intersectionNon-Monotonic LogicNew inputs might invalidate previous outputsOrder sensitivee.g., aggregation, negationSlide17
Agents learn strictly more knowledge over time
Different learning order, same final outcome
Deterministic outcome,
despite network non-determinism(“
Confluent
”)Slide18
Confluent ProgrammingSlide19
C
onsistencyAsLogicalMonotonicityCALM Analysis
(CIDR’11)
Monotone programs are deterministicSimple syntactic test for monotonicity
Result:
Whole-program
static analysis
for
eventual consistencySlide20
CALM: Beyond Sets
Monotonic logic: growing setsPartial order: set containmentExpressive but sometimes awkwardTimestamps, version numbers, threshold tests, directories, sequences, …Slide21
Challenge:
Extend monotone logic tosupport other flavors of“growth over time”Slide22
hS
,t,?i is a bounded join semilattice iff:S is a set
t is a binary operator (“least upper bound”)Induces a partial order on S: x
·S y
if
x
t
y
=
y
Associative, Commutative, and Idempotent
“ACID 2.0”
Informally, LUB is “merge function” for
S
? is the “least” element in S
8
x
2
S
:
?
t
x
=
xSlide23
Time
Set(t =
Union)Increasing
Int
(
t
=
Max
)
Boolean
(
t
=
Or
)Slide24
f :
ST is a monotone function iff:8a,b 2 S : a ·
S b ) f(a) ·
T f(b)Slide25
Time
Set(t =
Union)Increasing
Int
(
t
=
Max
)
Boolean
(
t
=
Or
)
size()
>= 3
Monotone function:
set
increase-
int
Monotone function:
increase-
int
booleanSlide26
The Bloom
ProgrammingLanguageSlide27
Bloom Basics
CommunicationMessage passingbetween agents
State
Lattices
Computation
Functions over lattices
“Disorderly” Computation
Monotone functionsSlide28
Bloom Operational ModelSlide29
Quorum Vote
QUORUM_SIZE = 5RESULT_ADDR
= "example.org
"
class
QuorumVote
include
Bud
state
do
channel
:
vote_chn
,
[
:
@
addr
,
:
voter_id
]
channel
:
result_chn
,
[
:
@
addr
]
lset
:
votes
lmax
:
vote_cnt
lbool
:
got_quorum
end
bloom
do
votes
<=
vote_chn
{
|
v
|
v
.
voter_id
}
vote_cnt
<=
votes
.
size
got_quorum
<=
vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } endend
Monotone function: set
max
Monotone function: max
bool
Threshold test on
bool (monotone)
Lattice state declarations
29
Communication interfaces:
non-deterministic delivery order!
Accumulate votes
into set
Annotated Ruby class
Program state
Program logic
Merge new votes together
with stored votes (set LUB)
Merge using
lmax
LUB
Merge at non-deterministic
future timeSlide30
Some Features of Bloom
Library of built-in lattice typesBooleans, increasing/decreasing integers, sets, multisets, maps, sequences, …API for defining custom lattice typesSupport both relational-style rules and functions over latticesModel-theoretic semantics (“Dedalus”)Logic + state update + async messagingSlide31
Ongoing Work
RuntimeCurrent implementation: Ruby DSLNext generation: JavaScript, code generationAlso target JVM, CLR, MapReduceToolsBloomUnit: Distributed testing / debuggingVerification of lattice type implementationsSoftware stackConcurrent editing, version controlGeo-replicated consistency controlSlide32
CRDTs vs. Bloom
SimilaritiesFocus on commutative operationsFormalized via join semilatticesMonotone functions composition of CRDTsSimilar design patterns (e.g., need for GC)DifferencesApproach: language design vs. ADTsCorrectness: confluence vs. convergenceConfluence is strictly strongerCRDT “query” is not necessarily monotoneCRDTs more expressive?Slide33
Ongoing Work
RuntimeCurrent implementation: Ruby DSLNext generation: JavaScript, code generationAlso target JVM, CLR, MapReduceToolsBloomUnit: Distributed testing / debuggingVerification of lattice type implementationsSoftware stackBuilt: Paxos, HDFS, 2PC, lock manager, causal delivery, distributed ML, shopping carts, routing, task scheduling, etc.
Working on:Concurrent editing, version controlGeo-replicated consistency controlSlide34
Blazes
:IntelligentCoordinationSynthesisSlide35
Mixing Order and Disorder
Can these ideas scale to large systems?Ordering can rarely be avoided entirelyMake order part of the design processAnnotate modules with ordering semanticsIf needed, coordinate at module boundariesPhilosophyStart with what we’re given (disorder)Create only what we need (order)Slide36
Tool Support
Path analysisHow does disorder flow through a program?Persistent vs. transient divergenceCoordination synthesisAdd “missing” coordination logic automaticallySlide37
Coordination Synthesis
Coordination is costlyHelp programmers use it wisely!Automatic synthesis of coordination logicCustomize coordination code to match:Application semantics (logical)Network topology (physical)Slide38
Application Semantics
Common pattern: “sessions”(Mostly) independent, finite durationDuring a session:Only coordinate among participantsAfter a session:Session contents are sealed (immutable)Coordination is unnecessary!Slide39
SealingNon-monotonicity
arbitrary changeVery conservative!Common pattern in practice:Mutable for a short periodImmutable forever afterExample: bank accounts at end-of-dayExample: distributed GCOnce (global) refcount = 0, remains 0Slide40
Affinity
Network structure affects coordination costExample:m clients, n storage servers1 client request many storage messagesPossible strategies:Coordinate among (slow?) clientsCoordinate among (fast?) serversRelated: geo-replication, intra- vs. inter-DC coordination, “sticky” sessionsSlide41
Algebraic
ProgrammingSlide42
Adversarial Programming
A confluent program must behave correctly for any network scheduleNetwork as “adversary”What if we could control the network?Schedule only influences performance,not correctnessSounds like an optimizer!Slide43
Algebra vs. Ordering
The developer writes two programs:Algebra defines program behaviorGuaranteed to be order independentLanguage: high-level, declarativeOrdering Spec controls input orderOrdering, batching, timingLanguage: arbitrary (e.g., imperative)Need not be deterministicSlide44
Benefits
Separate correctness and performanceMight be developed independently!Wide range of freedom for optimizationNo risk of harming correctnessRandomness, batching, parallelism,CPU affinity, data locality, …Auto-tuning/synthesis of ordering spec?Slide45
Examples
QuicksortAlgebra:Input: values to sort, pivot elementsOutput: sorted listOrdering Spec:Ordering of pivotsMatrix MultiplicationAlgebra:Input: sub-matrices
Output: result of matrix multiplyOrdering Spec:Tilingi.e., division of input matrices into piecesSlide46
In a distributed system,
Order is precious!Let’s stop taking it for granted.Slide47
Recap
The network is disorderly – embrace it!How can we write disorderly programs?State: join semilatticesComputation: monotone functionsWhen order is necessary, use it wiselyA program’s ordering requirements should be a first-class concern!Slide48
Thank You!
Questions Welcome