Slide1

Automatically Repairing Network Control Planes Using an Abstract Representation

RatulMahajan

Aaron

Gember

-Jacobson

Aditya

Akella

Hongqiang

Harry Liu

1

Slide2Control plane == heart of the network

2

Control

plane

Slide3Configuring the control plane

3

Protocol instances

Link/path weights

Traffic filters

â¦

Allow Alice & Bobâs computers to communicate

Bob

Alice

Complex!

Slide4Configuration errors are common

4

Took hours

to

repair the network

Slide5Existing toolsSynthesis (e.g., SyNET, Propane)

Verification (e.g., ARC, Minesweeper)

5

Verifier

Configurations

Policies

Violations

Synthesizer

Correct

configurations

PoliciesRepeat when policies change

Entirely new

configurations

â¹ï¸

Operator

Repaired

configurations

Manual intervention

â¹ï¸

Slide6Our contribution: CPR6

Broken

configurations

Repaired

configurations

Policies

Slide7Challenges7

Broken

configurations

Repaired

configurations

Compute repairs that

â¦

1) Satisfy

all

policies2) Are feasible to implement3) Require minimal changesPolicies

Inspiration:network verification & program repair

Slide8Our approach8

Policies

Broken

configurations

Repaired

configurations

Repaired model

Graph-based model

Constraint problem

A

B

Â

A =?, B = ?

Slide9Example networkPoliciesR â¨ T prefers Aâ B â C

S â¨ T traverses firewallS â¨ T reachable under single link failure

9

B

C

A

OSPF

OSPF

OSPF

R

S

T

FW

Slide10â

â

â

â

â

0

0

0

00

PoliciesR â¨ T prefers Aâ B â CS â¨ T traverses firewallS â¨ T reachable under single link failure

AOBO

C

O

A

I

B

I

C

I

1

1

0

0

0

0

0

1

1

R

T

Background: ARC [

SIGCOMMâ16

]

10

Max-flow = 1

B

C

A

OSPF

OSPF

OSPF

R

S

T

FW

S

T

FW

FW

1 edge-disjoint path

Shortest path

1

1

A

O

B

O

C

O

A

I

B

I

C

I

FW

FW

1

1

Slide11â

â

â

â

â

Policies

R â¨ T prefers Aâ B â C

S â¨ T traverses firewall

S â¨ T reachable under single link failure

AO

BOCOAI

B

I

C

I

R

T

1

1

0

0

0

0

0

1

1

Repairing ARC

11

FW

FW

1

1

Max-flow = 1

A

O

B

O

C

O

A

I

B

I

C

I

S

T

1

1

FW

FW

Add and remove

edges in the ARC

2

1

Challenge:

satisfying multiple policies

NP-hard

â¹ï¸

Slide12Repair as constraint solvingBoolean variables: for each possible edge in each graph

Constraints: on pathse.g., S â¨ T traverses firewallâ Path

without a FW from S to T

Base case: edge

â

without a FW

path

without a FW

Inductive case: path without a FW & edge â without a FW path

without a FWÂ 12Â

Â

Â

S

T

Slide13Policies

R â¨ T prefers Aâ B â C

S â¨ T traverses firewall

S â¨ T reachable under single link failure

A

O

B

O

C

OAI

BICI

R

T

1

1

0

0

0

0

0

1

1

Satisfying solution

13

FW

FW

1

1

Max-flow = 1

B

O

C

O

A

I

B

I

S

T

1

1

FW

FW

2

Is it a

feasible

solution?

0

0

0

0

0

A

O

C

I

1

FW

Slide14A

I

B

R

S

T

FW

B

O

C

OBI

R

T

1

1

0

0

0

0

0

1

1

Translating repairs

14

FW

FW

1

1

B

O

C

O

A

I

B

I

S

T

1

1

FW

FW

Challenge: conflicts between traffic classes

0

0

0

0

0

OSPF

C

A

OSPF

OSPF

A

O

C

I

1

A

O

C

I

Cannot enable routing

for a single

Src-Dst

pair

FW

FW

Slide15R

0

Hierarchical ARC

15

S

1

1

T

A

O

B

O

C

O

A

I

B

I

C

I

1

0

0

0

0

1

FW

FW

1

FW

1

FW

1

FW

Is it

a

minimal

solution

?

0

FW

FW

1

1

B

O

C

O

A

I

B

I

T

1

1

A

O

C

I

0

0

0

0

1

1

T

A

O

B

O

C

O

A

I

B

I

C

I

1

0

0

0

0

1

FW

FW

1

1

T

A

O

B

O

C

O

A

I

B

I

C

I

1

0

0

0

0

1

FW

FW

Disallow conflicts in model

Add constraints to enforce hierarchy

3

3

3

Slide16Each edge addition/removal requires a configuration change

Maximum satisfiability problemHard constraints: discussed earlierSoft constraints:

edge in original HARC

edge in repaired HARC (and vice versa)Objective: satisfy as many soft constraints as possible

Â

Repair

minimality

16Â Â

Â

Slide17Evaluation: dataset96 real data center networks

Synthetic fat tree

configurations

17Median =

8Max = 24

Median

=

1K

Max = 82KPoliciesRouters

Repair older configurations

to satisfy newer policiesTwosnapshots

Slide18Evaluation: performance

18

86% repaired in < 1 minute

99% repaired in < 1 hour

8 hours

72% repaired in < 8 hours

2 orders of magnitude reduction

Â

1 hour1 minuteÂ

Slide19Evaluation: CPR vs hand-written repairs19

CPR changes

same or fewer

lines in 79%

of networks

Overlapping

filtering rules

are condensed

in hand-writtenrepairs

Slide20SummaryAutomatically repair configurations for distributed control planesFormulate a maximum satisfiability

problem based on hierarchical ARCCompute repairs for small data centers in < 1

hour20

Try It

! http://aaron.gember-jacobson.com/go/

cpr

Slide21Dataset: real data center policies

21

S â¨ T blocked

S

â¨ T reachable under single link failure

Slide22Evaluation: scalability (fat trees)

22

