Chris Hawblitzel Jon Howell Manos Kapritsos Jacob R Lorch Bryan Parno Michael L Roberts Srinath Setty Brian Zill Jay Lorch Microsoft Research IronFleet 2 Jay Lorch Microsoft Research ID: 810800
Download The PPT/PDF document "IronFleet : Proving Practical Distribute..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
IronFleet:Proving Practical Distributed Systems Correct
Chris
Hawblitzel Jon Howell Manos Kapritsos
Jacob
R.
Lorch
Bryan Parno Michael
L.
Roberts Srinath Setty Brian
Zill
Slide2Jay Lorch, Microsoft ResearchIronFleet
2
Slide3Jay Lorch, Microsoft ResearchIronFleet
3
[Mickens 2013]
[
Zave
2015]
“not one of the properties claimed invariant in [PODC] is actually true of it.”
Slide4Jay Lorch, Microsoft ResearchIronFleet
4
We show how to build complex, efficient distributed systems whose implementations are provably safe and live.
Implementations are correct, not just abstract protocols
Proofs are machine-checked
System will make progress given sufficient time,
e.g., it cannot deadlock or
livelock
First mechanical proof of liveness of a non-trivial distributed protocol, let alone an implementation
Proof is subject to assumptions, not absolute
Spec
IronFleet
Slide5Paxos
What we built
Jay Lorch, Microsoft Research
IronFleet
5
A
B
C
IronRSL
Replicated state library
IronKV
Sharded
key-value store
You can do it too!
Complex with many features:
state transfer
log truncation
dynamic
view-change
timeouts
batching
reply
cache
Slide6Jay Lorch, Microsoft ResearchIronFleet
6
IronFleet extends state-of-the-art verification to distributed systems
IronFleet
Libraries
Toolset modifications
Methodology
Key contributions of our work:
Two-level refinement
Concurrency control via reduction
Always-enabled actions
Invariant quantifier hiding
Automated proofs of temporal logic
Discussed in talk
Discussed in paper
Slide7method
MarshallMessage
(msg:Message)
returns
(
data:
array
<byte>)
requires
!
msg.Invalid
?;
ensures
ParseMessage
(data) ==
msg
;
Jay Lorch, Microsoft Research
IronFleet
7
IronFleet
builds on existing research
Recent advances
Single-machine system verification
SMT solvers
IDEs for automated software verification
Demo: Provably correct marshalling
Slide8Not writing unit tests andhaving it work the first time: priceless
Jay Lorch, Microsoft Research
IronFleet8
Slide9Related work
Conclusions
OutlineJay Lorch, Microsoft Research
IronFleet
9
Introduction
Methodology
Evaluation
Methodology overview
Two-level refinement
Concurrency containment
Liveness
...
Libraries
Slide10Running example: IronRSL replicated state library
Jay Lorch, Microsoft ResearchIronFleet
10
Paxos
A
B
C
Safety property:
Equivalence to single machine
Liveness property:
Clients eventually get replies
Slide11Jay Lorch, Microsoft ResearchIronFleet
11
Specification approach:Rule out all bugs by construction
Race conditions
Invariant violations
Integer overflow
Deadlock
Livelock
Parsing errors
Marshalling errors
Buffer overflow
...
Slide12OutlineJay Lorch, Microsoft Research
IronFleet
12Introduction
Methodology
Evaluation
Methodology overview
Two-level refinement
Concurrency containment
Liveness
...
Libraries
Related work
Conclusions
Slide13Background: RefinementJay Lorch, Microsoft Research
IronFleet
13
Spec
Implementation
S0
S1
S2
S3
S4
S5
S6
S7
I0
I1
I2
I3
I4
[Milner 1971, Park 1981, Lamport 1983, Lynch 1983, ...]
method Main()
Slide14function
SpecRelation
(
realPackets:
set
<Packet>,
s:SpecState) :
bool
{
forall
p,
i
:: p
in
realPackets
&&
p.msg ==
MarshallCounterVal
(
i
) ==>
i
<= s
}
Specification is a simple, logically centralized state machine
Jay Lorch, Microsoft Research
IronFleet
14
function
SpecNext
(
s:SpecState
,
s’:
SpecState
) :
bool
{
s’ == s + 1
}
function
SpecInit
(
s:SpecState
) :
bool
{
s == 0
}
type
SpecState
=
int
Slide15ImplementationJay Lorch, Microsoft Research
IronFleet
15
method
Main()
{
var
s:
ImplState
;
s :=
ImplInit
();
while
(true) {
s :=
EventHandler
(s);
}
}
Host implementation is a single-threaded event-handler loop
Slide16Proving correctness is hardJay Lorch, Microsoft Research
IronFleet
16
Subtleties of distributed protocols
Complexities of implementation
Maintaining global invariants
Dealing with hosts acting concurrently
Ensuring progress
Using efficient data structures
Memory management
Avoiding integer overflow
Slide17Two-level refinementJay Lorch, Microsoft Research
IronFleet
17
I0
I1
I2
I3
I4
Implementation
Spec
S0
S1
S2
S3
S4
S5
S6
S7
Protocol
P0
P1
P2
P3
P4
Slide18Protocol layerJay Lorch, Microsoft Research
IronFleet
18
Spec
Implementation
function
ProtocolNext
(
s:HostState
, s’:
HostState
) : bool
method
EventHandler
(
s:HostState
) returns (s’:
HostState
)
array<uint64>
seq
<
int
>
type Message =
MessageRequest
() |
MessageReply
() | ...
type Packet = array<byte>
Protocol
Slide19Protocol layerJay Lorch, Microsoft Research
IronFleet
19
Spec
Implementation
Distributed system model
Protocol
Slide20Protocol layerJay Lorch, Microsoft Research
IronFleet
20
Spec
Protocol
Impl
I0
I1
I2
I3
I4
S0
S1
S2
S3
S4
S5
S6
S7
P0
P1
P2
P3
P4
method
EventHandler
(
s:HostState
) returns (s’:
HostState
)
ensures
ProtocolNext
(s, s’);
Slide21OutlineJay Lorch, Microsoft Research
IronFleet
21Introduction
Methodology
Evaluation
Methodology overview
Two-level refinement
Concurrency containment
Liveness
...
Libraries
Related work
Conclusions
Slide22Cross-host concurrency is hardJay Lorch, Microsoft Research
IronFleet
22
Host A Step 1
Host A Step 2
Host B Step 1
Host B Step 2
Host B Step 3
Host A Step 3
Hosts are single-threaded,
but we need to reason about concurrency of different hosts
Slide23Cross-host concurrency is hard
Jay Lorch, Microsoft Research
IronFleet
23
Requires reasoning about all possible interleaving of the
substeps
Slide24Enforce that receives precede sends in event handler
Concurrency containment strategy
Jay Lorch, Microsoft Research
IronFleet
24
Assume in proof that all host steps are atomic
Slide25Reduction argument: for every real trace...
Why concurrency containment works
Jay Lorch, Microsoft Research
IronFleet
25
Slide26Reduction argument: for every real trace...
Why concurrency containment works
Jay Lorch, Microsoft Research
IronFleet
26
...there’s a corresponding legal trace with atomic host steps
Constraining the implementation lets us think of the entire distributed system as hosts taking one step at a time.
Slide27Jay Lorch, Microsoft ResearchIronFleet
27
Most of the work of protocol refinement is proving invariants
P
P(j) &&
SystemNext
(states[j], states[j+1])
==> P(j+1)
SystemInit
(states[0]) ==> P(0)
S0
S1
S2
S3
S4
P
P
P
P
Complicated!
automated theorem proving
Nearly all cases proved automatically, without proof annotations
Slide28OutlineJay Lorch, Microsoft Research
IronFleet
28Introduction
Methodology
Evaluation
Methodology overview
Two-level refinement
Concurrency containment
Liveness
...
Libraries
Related work
Conclusions
Slide29One constructs a liveness proof
by finding a chain of conditions
Jay Lorch, Microsoft ResearchIronFleet
29
C
0
C
1
C
2
C
3
C
4
C
n
...
Assumed starting condition
Ultimate goal
Slide30Paxos
A
B
C
Simplified example
Jay Lorch, Microsoft Research
IronFleet
30
Client sends request
Replica suspects leader
Replica receives request
Leader election starts
Slide31Client sends request
Some links can be proven from assumptions about the network
Jay Lorch, Microsoft ResearchIronFleet
31
Replica suspects leader
Replica receives request
Leader election starts
Network eventually delivers packets in bounded time
Slide32Client sends request
Most links involve reasoning about host actions
Jay Lorch, Microsoft ResearchIronFleet
32
Replica suspects leader
Replica has request
Leader election starts
One action that event handler can perform is “become suspicious”
Slide33Tricky things to prove:
Action is enabled (can be done) whenever C
i holds If Action is always enabled it’s eventually performed
Lamport provides a rule for proving links
Jay Lorch, Microsoft Research
IronFleet
33
C
i
C
i+1
Action
Enablement poses difficulty for automated theorem proving
Slide34Always-enabled actionsJay Lorch, Microsoft Research
IronFleet
34
Handle a client request
If you have a request to handle, handle it;
otherwise, do nothing
Slide35Tricky things to prove:
Action is enabled (can be done) whenever C
i holds If Action is always enabled it’s eventually performed
Action
is
performed infinitely often
Always-enabled actions allow a simpler form of
Lamport’s
rule
Jay Lorch, Microsoft Research
IronFleet
35
C
i
C
i+1
Action
Slide36To execute each action infinitely often, event handler is a simple schedulerJay Lorch, Microsoft Research
IronFleet
36
ProtocolNext
Action 1
Action 2
Action 3
Action 4
Action 5
Action 6
Action 7
Action 8
Action 9
Straightforward to prove that if event handler runs infinitely often, each action does too
Slide37Always-enabled actions pose a danger we’re immune toJay Lorch, Microsoft Research
IronFleet
37
Always-enabled actions can create an
unimplementable
protocol
We’ll discover this when we try to implement it!
Handle a client request
Slide38Related work
Conclusions
OutlineJay Lorch, Microsoft Research
IronFleet
38
Introduction
Methodology
Evaluation
Methodology overview
Two-level refinement
Concurrency containment
Liveness
...
Libraries
Slide39We built general-purpose verified librariesJay Lorch, Microsoft Research
IronFleet
39
Temporal logic
Parsing and marshalling
Induction
Liveness
uint64
uint64
C
i
C
i+1
Action
P
S0
S1
S2
P
P
...
Implementation
Proof
Assumptions about hardware and network
Slide40Much more in the paper!Invariant quantifier hidingEmbedding temporal logic in
DafnyReasoning about timeStrategies for writing imperative codeTool improvements
Jay Lorch, Microsoft ResearchIronFleet
40
Slide41Related work
Conclusions
OutlineJay Lorch, Microsoft Research
IronFleet
41
Introduction
Methodology
Evaluation
Methodology overview
Two-level refinement
Concurrency containment
Liveness
...
Libraries
Slide42Line counts
Jay Lorch, Microsoft Research
IronFleet42
Common Libraries
IronRSL
IronKV
Few lines of spec in TCB
Safety proof to code ratio is 5:1, comparable to
Ironclad (4:1) and seL4 (26:1)
Including liveness, proof-to-code ratio is 8:1
Slide43IronRSL performanceJay Lorch, Microsoft Research
IronFleet
43
Reasoning about the heap is currently difficult
Each optimization requires a proof of equivalence
We trade some performance for strong correctness,
but no fundamental reason verified code should be slower
Separating implementation from protocol allows performance improvement without disturbing proof
Adding batching
(~2-3
person-months) improved performance significantly
Slide44IronKV performanceJay Lorch, Microsoft Research
IronFleet
44
Get
Set
Throughput 25%-75% of
Redis
Slide45Related work
Single-system verification seL4 microkernel [Klein et al., SOSP 2009
] CompCert C compiler [Leroy, CACM 2009] Ironclad end-to-end secure applications [Hawblitzel et al., OSDI 2014]
Distributed-system verification
(but no liveness properties)
EventML
replicated database [
Rahli
et al, DSN 2014]
Framework with components useful for building verified distributed systems
Proof only about consensus algorithm, not state machine replication
Uses only one layer of refinement, so all optimizations done by compiler
Verdi framework and Raft implementation [Wilcox et al., PLDI 2015]Uses interactive theorem proving to build verified distributed systems
Missing features like batching, state transfer, etc.Uses system transformers
, a cleaner approach to composition
Jay Lorch, Microsoft Research
IronFleet
45
Slide46ConclusionsJay Lorch, Microsoft Research
IronFleet
46It’s now possible to build provably correct distributed systems...
...despite implementation complexity necessary for features and performance
...including both safety and liveness properties
To build on our code:
https://github.com/Microsoft/Ironclad
To try
Dafny
:
http://rise4fun.com/dafny
Thanks for your attention!