0K - views

IronFleet : Proving Practical Distributed Systems Correct

Chris . Hawblitzel Jon Howell Manos Kapritsos . Jacob . R. . Lorch. Bryan Parno Michael . L. . Roberts Srinath Setty Brian . Zill. Jay Lorch, Microsoft Research. IronFleet. 2. Jay Lorch, Microsoft Research.

Embed :
Download Link

Download - The PPT/PDF document "IronFleet : Proving Practical Distribute..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

IronFleet : Proving Practical Distributed Systems Correct






Presentation on theme: "IronFleet : Proving Practical Distributed Systems Correct"— Presentation transcript:

Slide1

IronFleet:Proving Practical Distributed Systems Correct

Chris

Hawblitzel Jon Howell Manos Kapritsos

Jacob

R.

Lorch

Bryan Parno Michael

L.

Roberts Srinath Setty Brian

Zill

Slide2

Jay Lorch, Microsoft ResearchIronFleet

2

Slide3

Jay Lorch, Microsoft ResearchIronFleet

3

[Mickens 2013]

[

Zave

2015]

“not one of the properties claimed invariant in [PODC] is actually true of it.”

Slide4

Jay Lorch, Microsoft ResearchIronFleet

4

We show how to build complex, efficient distributed systems whose implementations are provably safe and live.

Implementations are correct, not just abstract protocols

Proofs are machine-checked

System will make progress given sufficient time,

e.g., it cannot deadlock or

livelock

First mechanical proof of liveness of a non-trivial distributed protocol, let alone an implementation

Proof is subject to assumptions, not absolute

Spec

IronFleet

Slide5

Paxos

What we built

Jay Lorch, Microsoft Research

IronFleet

5

A

B

C

IronRSL

Replicated state library

IronKV

Sharded

key-value store

You can do it too!

Complex with many features:

state transfer

log truncation

dynamic

view-change

timeouts

batching

reply

cache

Slide6

Jay Lorch, Microsoft ResearchIronFleet

6

IronFleet extends state-of-the-art verification to distributed systems

IronFleet

Libraries

Toolset modifications

Methodology

Key contributions of our work:

Two-level refinement

Concurrency control via reduction

Always-enabled actions

Invariant quantifier hiding

Automated proofs of temporal logic

Discussed in talk

Discussed in paper

Slide7

method

MarshallMessage

(msg:Message)

returns

(

data:

array

<byte>)

requires

!

msg.Invalid

?;

ensures

ParseMessage

(data) ==

msg

;

Jay Lorch, Microsoft Research

IronFleet

7

IronFleet

builds on existing research

Recent advances

Single-machine system verification

SMT solvers

IDEs for automated software verification

Demo: Provably correct marshalling

Slide8

Not writing unit tests andhaving it work the first time: priceless

Jay Lorch, Microsoft Research

IronFleet8

Slide9

Related work

Conclusions

OutlineJay Lorch, Microsoft Research

IronFleet

9

Introduction

Methodology

Evaluation

Methodology overview

Two-level refinement

Concurrency containment

Liveness

...

Libraries

Slide10

Running example: IronRSL replicated state library

Jay Lorch, Microsoft ResearchIronFleet

10

Paxos

A

B

C

Safety property:

Equivalence to single machine

Liveness property:

Clients eventually get replies

Slide11

Jay Lorch, Microsoft ResearchIronFleet

11

Specification approach:Rule out all bugs by construction

Race conditions

Invariant violations

Integer overflow

Deadlock

Livelock

Parsing errors

Marshalling errors

Buffer overflow

...

Slide12

OutlineJay Lorch, Microsoft Research

IronFleet

12Introduction

Methodology

Evaluation

Methodology overview

Two-level refinement

Concurrency containment

Liveness

...

Libraries

Related work

Conclusions

Slide13

Background: RefinementJay Lorch, Microsoft Research

IronFleet

13

Spec

Implementation

S0

S1

S2

S3

S4

S5

S6

S7

I0

I1

I2

I3

I4

[Milner 1971, Park 1981, Lamport 1983, Lynch 1983, ...]

method Main()

Slide14

function

SpecRelation

(

realPackets:

set

<Packet>,

s:SpecState) :

bool

{

forall

p,

i

:: p

in

realPackets

&&

p.msg ==

MarshallCounterVal

(

i

) ==>

i

<= s

}

Specification is a simple, logically centralized state machine

Jay Lorch, Microsoft Research

IronFleet

14

function

SpecNext

(

s:SpecState

,

s’:

SpecState

) :

bool

{

s’ == s + 1

}

function

SpecInit

(

s:SpecState

) :

bool

{

s == 0

}

type

SpecState

=

int

Slide15

ImplementationJay Lorch, Microsoft Research

IronFleet

15

method

Main()

{

var

s:

ImplState

;

s :=

ImplInit

();

while

(true) {

s :=

EventHandler

(s);

}

}

Host implementation is a single-threaded event-handler loop

Slide16

Proving correctness is hardJay Lorch, Microsoft Research

IronFleet

16

Subtleties of distributed protocols

Complexities of implementation

Maintaining global invariants

Dealing with hosts acting concurrently

Ensuring progress

Using efficient data structures

Memory management

Avoiding integer overflow

Slide17

Two-level refinementJay Lorch, Microsoft Research

IronFleet

17

I0

I1

I2

I3

I4

Implementation

Spec

S0

S1

S2

S3

S4

S5

S6

S7

Protocol

P0

P1

P2

P3

P4

Slide18

Protocol layerJay Lorch, Microsoft Research

IronFleet

18

Spec

Implementation

function

ProtocolNext

(

s:HostState

, s’:

HostState

) : bool

method

EventHandler

(

s:HostState

) returns (s’:

HostState

)

array<uint64>

seq

<

int

>

type Message =

MessageRequest

() |

MessageReply

() | ...

type Packet = array<byte>

Protocol

Slide19

Protocol layerJay Lorch, Microsoft Research

IronFleet

19

Spec

Implementation

Distributed system model

Protocol

Slide20

Protocol layerJay Lorch, Microsoft Research

IronFleet

20

Spec

Protocol

Impl

I0

I1

I2

I3

I4

S0

S1

S2

S3

S4

S5

S6

S7

P0

P1

P2

P3

P4

method

EventHandler

(

s:HostState

) returns (s’:

HostState

)

ensures

ProtocolNext

(s, s’);

Slide21

OutlineJay Lorch, Microsoft Research

IronFleet

21Introduction

Methodology

Evaluation

Methodology overview

Two-level refinement

Concurrency containment

Liveness

...

Libraries

Related work

Conclusions

Slide22

Cross-host concurrency is hardJay Lorch, Microsoft Research

IronFleet

22

Host A Step 1

Host A Step 2

Host B Step 1

Host B Step 2

Host B Step 3

Host A Step 3

Hosts are single-threaded,

but we need to reason about concurrency of different hosts

Slide23

Cross-host concurrency is hard

Jay Lorch, Microsoft Research

IronFleet

23

Requires reasoning about all possible interleaving of the

substeps

Slide24

Enforce that receives precede sends in event handler

Concurrency containment strategy

Jay Lorch, Microsoft Research

IronFleet

24

Assume in proof that all host steps are atomic

Slide25

Reduction argument: for every real trace...

Why concurrency containment works

Jay Lorch, Microsoft Research

IronFleet

25

Slide26

Reduction argument: for every real trace...

Why concurrency containment works

Jay Lorch, Microsoft Research

IronFleet

26

...there’s a corresponding legal trace with atomic host steps

Constraining the implementation lets us think of the entire distributed system as hosts taking one step at a time.

Slide27

Jay Lorch, Microsoft ResearchIronFleet

27

Most of the work of protocol refinement is proving invariants

P

P(j) &&

SystemNext

(states[j], states[j+1])

==> P(j+1)

SystemInit

(states[0]) ==> P(0)

S0

S1

S2

S3

S4

P

P

P

P

Complicated!

automated theorem proving

Nearly all cases proved automatically, without proof annotations

Slide28

OutlineJay Lorch, Microsoft Research

IronFleet

28Introduction

Methodology

Evaluation

Methodology overview

Two-level refinement

Concurrency containment

Liveness

...

Libraries

Related work

Conclusions

Slide29

One constructs a liveness proof

by finding a chain of conditions

Jay Lorch, Microsoft ResearchIronFleet

29

C

0

C

1

C

2

C

3

C

4

C

n

...

Assumed starting condition

Ultimate goal

Slide30

Paxos

A

B

C

Simplified example

Jay Lorch, Microsoft Research

IronFleet

30

Client sends request

Replica suspects leader

Replica receives request

Leader election starts

Slide31

Client sends request

Some links can be proven from assumptions about the network

Jay Lorch, Microsoft ResearchIronFleet

31

Replica suspects leader

Replica receives request

Leader election starts

Network eventually delivers packets in bounded time

Slide32

Client sends request

Most links involve reasoning about host actions

Jay Lorch, Microsoft ResearchIronFleet

32

Replica suspects leader

Replica has request

Leader election starts

One action that event handler can perform is “become suspicious”

Slide33

Tricky things to prove:

Action is enabled (can be done) whenever C

i holds If Action is always enabled it’s eventually performed

Lamport provides a rule for proving links

Jay Lorch, Microsoft Research

IronFleet

33

C

i

C

i+1

Action

Enablement poses difficulty for automated theorem proving

Slide34

Always-enabled actionsJay Lorch, Microsoft Research

IronFleet

34

Handle a client request

If you have a request to handle, handle it;

otherwise, do nothing

Slide35

Tricky things to prove:

Action is enabled (can be done) whenever C

i holds If Action is always enabled it’s eventually performed

Action

is

performed infinitely often

Always-enabled actions allow a simpler form of

Lamport’s

rule

Jay Lorch, Microsoft Research

IronFleet

35

C

i

C

i+1

Action

Slide36

To execute each action infinitely often, event handler is a simple schedulerJay Lorch, Microsoft Research

IronFleet

36

ProtocolNext

Action 1

Action 2

Action 3

Action 4

Action 5

Action 6

Action 7

Action 8

Action 9

Straightforward to prove that if event handler runs infinitely often, each action does too

Slide37

Always-enabled actions pose a danger we’re immune toJay Lorch, Microsoft Research

IronFleet

37

Always-enabled actions can create an

unimplementable

protocol

We’ll discover this when we try to implement it!

Handle a client request

Slide38

Related work

Conclusions

OutlineJay Lorch, Microsoft Research

IronFleet

38

Introduction

Methodology

Evaluation

Methodology overview

Two-level refinement

Concurrency containment

Liveness

...

Libraries

Slide39

We built general-purpose verified librariesJay Lorch, Microsoft Research

IronFleet

39

Temporal logic

Parsing and marshalling

Induction

Liveness

uint64

uint64

C

i

C

i+1

Action

P

S0

S1

S2

P

P

...

Implementation

Proof

Assumptions about hardware and network

Slide40

Much more in the paper!Invariant quantifier hidingEmbedding temporal logic in

DafnyReasoning about timeStrategies for writing imperative codeTool improvements

Jay Lorch, Microsoft ResearchIronFleet

40

Slide41

Related work

Conclusions

OutlineJay Lorch, Microsoft Research

IronFleet

41

Introduction

Methodology

Evaluation

Methodology overview

Two-level refinement

Concurrency containment

Liveness

...

Libraries

Slide42

Line counts

Jay Lorch, Microsoft Research

IronFleet42

Common Libraries

IronRSL

IronKV

Few lines of spec in TCB

Safety proof to code ratio is 5:1, comparable to

Ironclad (4:1) and seL4 (26:1)

Including liveness, proof-to-code ratio is 8:1

Slide43

IronRSL performanceJay Lorch, Microsoft Research

IronFleet

43

Reasoning about the heap is currently difficult

Each optimization requires a proof of equivalence

We trade some performance for strong correctness,

but no fundamental reason verified code should be slower

Separating implementation from protocol allows performance improvement without disturbing proof

Adding batching

(~2-3

person-months) improved performance significantly

Slide44

IronKV performanceJay Lorch, Microsoft Research

IronFleet

44

Get

Set

Throughput 25%-75% of

Redis

Slide45

Related work

Single-system verification seL4 microkernel [Klein et al., SOSP 2009

] CompCert C compiler [Leroy, CACM 2009] Ironclad end-to-end secure applications [Hawblitzel et al., OSDI 2014]

Distributed-system verification

(but no liveness properties)

EventML

replicated database [

Rahli

et al, DSN 2014]

Framework with components useful for building verified distributed systems

Proof only about consensus algorithm, not state machine replication

Uses only one layer of refinement, so all optimizations done by compiler

Verdi framework and Raft implementation [Wilcox et al., PLDI 2015]Uses interactive theorem proving to build verified distributed systems

Missing features like batching, state transfer, etc.Uses system transformers

, a cleaner approach to composition

Jay Lorch, Microsoft Research

IronFleet

45

Slide46

ConclusionsJay Lorch, Microsoft Research

IronFleet

46It’s now possible to build provably correct distributed systems...

...despite implementation complexity necessary for features and performance

...including both safety and liveness properties

To build on our code:

https://github.com/Microsoft/Ironclad

To try

Dafny

:

http://rise4fun.com/dafny

Thanks for your attention!