/
1 SAMC: Sematic-Aware Model Checking 1 SAMC: Sematic-Aware Model Checking

1 SAMC: Sematic-Aware Model Checking - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
382 views
Uploaded On 2018-03-22

1 SAMC: Sematic-Aware Model Checking - PPT Presentation

for Fast Discovery of Deep Bugs in Cloud Systems Tanakorn Leesatapornwongsa Mingzhe Hao Pallavi Joshi Jeffrey F Lukman and Haryadi S Gunawi 2 ID: 660259

samc 5000 vote message 5000 samc message vote zookeeper crashes black crash bugs model synchronization deep leader reduction semantic belief box symmetry

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 SAMC: Sematic-Aware Model Checking" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

SAMC: Sematic-Aware Model Checkingfor Fast Discovery ofDeep Bugs in Cloud Systems

Tanakorn Leesatapornwongsa, Mingzhe Hao,Pallavi Joshi*, Jeffrey F. Lukman†,and Haryadi S. Gunawi

*Slide2

2

Internet ServicesSlide3

3

Internet Services

Cloud SystemsSlide4

Reliability

4

Complex failures

“Deep bugs”Slide5

Deep Bug Example

ZooKeeper

(synchronization service)

Issue

#335

.

1

.

Nodes A, B, C start (w/ latex

txid

: 10)

2.

B becomes

leader

3.

B crashes

4

.

C becomes leader

5

.

C commits new

txid

-value pair (11, X)

6

.

A

crashes

, before

committing

the new

txid

11

7

.

C loses quorum and

C crashes

8

.

A and B

are back online

after

C crashes

9

.

A

becomes leader

10.

A's commits new

txid

-value pair (11, Y)

11.

C is back online

after A's new

tx commit

12. C announce to B (11, X)13. B replies diff starting with tx 1214.

Inconsistency: A has (11, Y), C has (11, X)

5

F

F

L

F

L

F

L

L

L

F

L

F

L

F

F

x

x

x

x

x

y

y

y

y

x

A

B

C

PERMANENT INCONSISTENT REPLICASlide6

Deep Bug Characteristics

ZooKeeper (synchronization service)

Issue #335.1. Nodes A, B, C start (w/ latex txid: 10)2. B becomes leader3. B crashes4. C becomes leader5.

C commits new txid-value pair (11, X)

6. A

crashes, before

committing the new

txid

11

7

.

C loses quorum and

C crashes

8

.

A and B

are back online

after

C crashes

9

.

A

becomes leader

10.

A's commits new

txid

-value pair (11, Y)

11.

C

is back online

after

A's new

tx

commit

12.

C announce to B (11, X)

13.

B replies diff starting with

tx

12

14.

Inconsistency: A has (11, Y), C has (11, X

)

6

Specific Order

1. Out-of-order messages

2

. Multiple crashes

3

. Multiple reboots

HAPPEN IN ANY ORDER

1

2

3Slide7

Study of Deep Bugs

x-axis is bug number

y-axis is number of crashes and reboots7

# crashes

/

# reboots

# crashes

/

# reboots

# crashes

/

# reboots

SEVERE

IMPLICATIONS

INCONSISTENT REPLICAS,

DATA LOSS, DOWNTIMES, ETC.Slide8

How do we catch deep bugs in

distributed systems?8Slide9

How to Catch Deep Bugs

Distributed system model checkerRe-ordering all non-deterministic eventsFind which specific orderings lead to bugs9

1234567891011121314

2

71

4563

811

10

9

12

14

13

6

9

3

4

5

1

7

8

2

10

11

13

12

14

3

4

1

5

2

6

7

8

9

12

11

10

14

13

2

1

3

4

5

7

6

8

9

10

11

14

12

13

. .

.Slide10

What’s Wrong with

Existing Model Checkers?Last 7 years MaceMC [NSDI’07], Modist [NSDI’09],

dBug [SSV’10], Demeter [SOSP’13], etc.BUTToo many eventsMultiple crashes and rebootsCreate more messagesNo model checker incorporate multiple crashes and rebootsCannot find deep bugs!10100 eventsSlide11

How do we catch deep bugs

REALLY FAST?11Slide12

Black-Box Approach

Existing model checkers are so slowThey treat target systems as black boxesA large number of event orderings are generated12

BlackBoxA

B

C

D

Black Box

Model Checker

ABCD

ABDC

ACBD

ACDB

ADBC

.

. .

(24 total)Slide13

Semantic Knowledge

How can we make model checker fast?Exploit semantic knowledgeSemantic-aware model checker (SAMC)13

AB

C

D

SAMC

Black

BoxSlide14

Black Box vs. SAMC

14BlackBox

Black BoxModel CheckerABCDABDCACBD

ACDBADBCADCB

BACDBADC

BCADBCDABDAC

. . .

A

SAMC with

message processing semantic

ABCD

ABDC

ACBD

ACDB

ADBC

ADCB

BACD

BADC

BCAD

BCDA

. . .

B

C

D

A

B

C

D

Unnecessary

Re-orderings

Lead to the same state

Message

Processing

SemanticSlide15

SAMC with

crash recovery semanticABCDX

ABCXDABXCDAXBCDXABCDABDCXABDXC. . .N3

SAMC with Crashes

15

Black Box

Model

checker

ABCD

X

ABC

X

D

AB

X

CD

A

X

BCD

X

ABCD

ABDC

X

ABD

X

C

.

. .

N1

N2

N4

A,B

C,D

Unnecessary

Re-orderings

Crash

Recovery

SemanticSlide16

Generic Reduction Algorithms

16

Principle ofSemanticAwarenessLocal-MessageIndependence (LMI)

Crash-Message

Independence (CMI)

Crash

Recovery

Symmetry (CRS)

Reboot Synchronization

Symmetry (RSS)Slide17

SAMC Implementation and Integration

Cloud systems

ProtocolCassandraGossiperHinted handoffRead/writeHadoop 2.0Cluster managementSpeculative executionZooKeeperAtomic broadcastLeader election17

SAMC implementation

10,000 LOC from scratch

Apply SAMC to 3 cloud

systems

7 protocols

10

versionsSlide18

Result

Reproduced 12 old bugsCompare to state-of-the-art techniquesDynamic Partial Order Reduction (DPOR)Random-DPORFind bugs 2x to 340x

faster49x on averageFound 2 new bugsSubmit them to developers18Slide19

Outline

IntuitionSAMCLocal-Message IndependenceCrash-Message IndependenceCrash Recovery SymmetryReboot Synchronization SymmetryEvaluation

19Slide20

Dependency vs. Independency

20

AB

NodeState = S

S

A, B

B, A

S’

S’’

B, A

A, B

S

S’

A, B = Dependent

A, B = Independent

INDEPENDENT = NO REORDERING

2X SPEED-UP

UnnecessarySlide21

Black Box vs. SAMC

21SAMC

ABCDABDCBACDBADCBlackBox

A

B

C

D

A

B

C

D

All dependent

D

ependent

D

ependent

Semantic

Info

Black Box

Model Checker

ABCD

ABDC

ACBD

ACDB

ADBC

ADCB

BACD

BADC

BCAD

BCDA

. . .

4!=24 orderings

6

X SPEED-UPSlide22

Reduction Speed-up

22S

ABCDABDCACBD. . .

.

.

.

.

.

.

.

.

.

S

ABCD

ABDC

ACBD

. . .

.

.

.

.

.

.

.

.

.Slide23

How to Declare Message Independency?

23Q: Which concurrent messages are independent ?

A: Use message processing semanticSlide24

Message Processing Semantic in

Simplified Leader Election

24Belief = n3

Vote=1

B = 3

V = 1

i

f

(

vote

<=

belief

)

// do nothing

e

lse

belief

=

vote

;

Vote=2

Vote=4

B = 3

B = 3

V = 2

B = 3

B = 3

V = 4

B = 4Slide25

Removing Re-ordering via

Message Processing Semantic25

B = 4

V = 1

B = 4

B = 4

V =

2

V =

3

B = 4

Belief=4

B = 4

V = 1

B = 4

B = 4

V =

3

V =

2

B = 4

. . .

B = 4

V =

2

B = 4

B = 4

V = 1

V =

3

B = 4

Vote=1

Vote=2

Vote=3

1

,

2

,

3

1

,

3

,

2

2

,

1

,

3

i

f

(

vote <= belief

)

// do nothing

else

belief = vote;

UnnecessarySlide26

Formalizing

the Intuition26

if (isDiscard(

msg, ls

)) {

// do nothing;}

DISCARD PATTERN

b

oolean

discardPredicate

(

msg

,

ls

) {

if (

msg.vote

<=

ls.belief

)

return

true

;

else

return

false

;

}

DISCARD PREDICATE

i

f

(

vote <= belief

)

// do nothing

else

belief = vote;

MESSAGE PROCESSING SEMANTICSlide27

Formalizing the Intuition

27

mxmydiscard(mx)discard(my)Independent12truetrue✔1

3truetrue

✔2

3true

true✔

vote

belief

discardPredicate

1

4

true

2

4

true

3

4

true

Belief=4

Vote=1

Vote=2

Vote=3Slide28

Outline

IntuitionSAMCLocal-Message IndependenceCrash-Message IndependenceCrash Recovery SymmetryReboot Synchronization SymmetryEvaluation

28Slide29

Local state: ls1

SAMC Architecture

29interceptora, bc, d

Local state: ls2

interceptor

Dynamic Partial

Order Reduction (DPOR)

Symmetry

Basic

Reduction

Techniques

Generic

Reduction

Policies

LMI

CMI

CRS

RSS

Protocol

Specific

Rules

Leader

Election

Atomic

Broadcast

r

elease(c)

SAMC

…Slide30

Local-Message Independence

30

Generic Reduction PoliciesLocal-MessageIndependence (LMI)Crash-MessageIndependence (CMI)

Crash Recovery

Symmetry (CRS)

Reboot Synchronization

Symmetry (RSS)

SAMCSlide31

Local-Message Independence

Discard patternIncrement patternConstant pattern

31if (msg.type == ack) { node.ackCount++;

}

C

=0

ack

C=1

C=2

ack

b

oolean

incrementPredicate

(

msg

,

ls

) {

if (

msg.type

=

=

ack

)

return

true

;

else

return

false

;

}

C=0

ack

C=1

C=2

ackSlide32

Crash-Message Independence

32

Generic Reduction PoliciesCrash-MessageIndependence (CMI)Local-MessageIndependence (LMI)

Crash Recovery

Symmetry (CRS)

Reboot Synchronization

Symmetry (RSS)

SAMCSlide33

Crash-Message Independence

33

Black BoxABCDXABCXDABXCDAXBCDXABCDABDCX…void

handleCrash() { if (

X == follower && isQuorum

())

followerCount--;

// No new messages

}

b

oolean

localImpact

(X,

ls

) {

if (

X == follower &&

isQuorum

()

)

return

true

;

else

return

false

;

}

L

F

F

A,B

C,D

F

L

F

F

A,B

C,D

F

Local ImpactSlide34

Crash-Message Independence

34

boolean globalImpact(X, ls) { if (X == leader || !isQuorum()) return true

; else

return false

;}

L

F

F

A,B

C,D

F

L

S

S

S

Global Impact

void

handleCrash

(

) {

if (

X == leader || !

isQuorum

()

)

electLeader

()

// New messages created

}Slide35

Crash Recovery Symmetry

Reboot Synchronization Symmetry35

Generic Reduction PoliciesCrash RecoverySymmetry (CRS)Local-MessageIndependence (LMI)

Crash-

MessageIndependence

(CMI)

Reboot Synchronization

Symmetry (RSS)

SAMCSlide36

Outline

IntuitionSAMCLocal-Message IndependenceCrash-Message IndependenceCrash Recovery SymmetryReboot Synchronization SymmetryEvaluation

36Slide37

Evaluation

Cloud systemsProtocol

CassandraGossiperHinted handoffRead/writeHadoop 2.0Cluster managementSpeculative executionZooKeeperAtomic broadcastLeader election37

SAMC implementation

10,000 LOC from scratch

Apply SAMC to 3 cloud

systems

7 protocols

10

versionsSlide38

Protocol-Specific Rules

(e.g. ZooKeeper Leader Election)Guide SAMC to remove re-orderings35 LOC on average per protocol

38Slide39

Catching Old Bugs

SAMC#exe

1177531610057611453401049639A table shows number of executions to reach the bugs and speedup

Issue#

ZooKeeper-335

ZooKeeper-790

ZooKeeper-975ZooKeeper-1075

ZooKeeper-1419

ZooKeeper-1492

ZooKeeper-1653

MapReduce-4748

MapReduce-5489

MapReduce-5505

Cassandra-3395

Cassandra-3626

Black-Box DPOR

#exe

speedup

5000+

14

967

1081

924

5000+

945

22

5000+

1212

2552

5000+

Random

#exe

speedup

1057

225

71

86

2514

5000+

3756

6

5000+

5000+

191

5000+

Random DPOR

#exe

speedup

5000+

82

163

250

987

5000+

3462

6

5000+

1210

550

5000+Slide40

Catching Old Bugs

SAMC#exe

1177531610057611453401049640A table shows number of executions to reach the bugs and speedup

Issue#

ZooKeeper-335

ZooKeeper-790

ZooKeeper-975ZooKeeper-1075

ZooKeeper-1419

ZooKeeper-1492

ZooKeeper-1653

MapReduce-4748

MapReduce-5489

MapReduce-5505

Cassandra-3395

Cassandra-3626

Black-Box DPOR

#exe

speedup

5000+

43+

14

2

967

18

1081

68

924

9

5000+

9+

945

86

22

6

5000+

94+

1212

30

2552

25

5000+

52+

Random

#exe

speedup

1057

9

225

32

71

1

86

5

2514

25

5000+

9+

3756

341

6

2

5000+

94+

5000+

125+

191

2

5000+

52+

Random DPOR

#exe

speedup

5000+

43+

82

12

163

3

250

16

987

10

5000+

9+

3462

315

6

2

5000+

94+

1210305505

5000+

52Slide41

Reduction Ratio

ZooKeeper leader election protocolRun black-box DPORAfter each execution, find that this execution is executed by SAMC or notCount DPOR’s executions that are executed by SAMC too

41Reduction RatioALLLMICMICRSRSS37X

18X5X4X

3X63X

35X6X5X

5X103X

37X

9X

9X

14X

#Crash

#Reboot

1

1

2

2

3

3Slide42

Conclusion

Deep bugs live in the cloudModel checker needs to incorporate complex failure to reach deep bugsState space explosionSemantic-aware model checkingLMI, CMI, CRS, RSSBring future research questionsWhat other semantic knowledge is useful?How to extract them from the code automatically?

42Slide43

Thank You

Questions?43

http://ucare.cs.uchicago.edu