Karthick Jayaraman Nikolaj Bjørner Jitu Padhye Amar Agrawal Ashish Bhargava PaulAndre C Bissonnette Shane Foster Andrew Helwer Mark Kasten Ivan Lee Anup Namdhari Haseeb Niaz Aniruddha Parkhi Hanukumar Pinnamraju Adrian Power Neha Milind Raje Parag Sharma ID: 810418
Download The PPT/PDF document "Validating Datacenters at Scale" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Validating Datacenters at Scale
Karthick JayaramanNikolaj Bjørner, Jitu Padhye, Amar Agrawal, Ashish Bhargava, Paul-Andre C Bissonnette, Shane Foster, Andrew Helwer, Mark Kasten, Ivan Lee, Anup Namdhari, Haseeb Niaz, Aniruddha Parkhi, Hanukumar Pinnamraju, Adrian Power, Neha Milind Raje, Parag Sharma
Microsoft Azure Networking
Slide254
regions
worldwide
140
countries
network
devices
servers
maintenance
changes/day
Hyperscale Azure Datacenter Network
policies
Slide3Reliablity at Hyperscale
Is the network operating as expected?
Will my change affect the network?
3
Slide4Reality Checker for Datacenters (RCDC)
What is the Reality?
What is the Intent?
How to scale verification?
What do we do with the results?
4
Slide5Forwarding Information Base (FIB)
Prefix
NextHops
100.25.0.0/24
{ i
1
, i
2
}
0.0.0.0/0
i
4
i
1
i
2
i
4
i
3
dstIp
=100.25.0.1
dstIp
=100.26.0.1
Determines forwarding behavior of each device
Longest prefix matching
Collectively determine forwarding behavior of the network
5
Slide6Reality Checker for Datacenters (RCDC)
What is the Reality?
What is the Intent?
How to scale verification?
What do we do with the results?
6
Slide7What is the intent?
All Pairs ToR Reachability7
R
1
R
2
R
3
R
4
D
1
D
2
D
3
D4
A
1
A2
A3A4
B1
B2
B
3
B4
ToR
1
ToR
2
ToR
4
ToR
3
Cluster 1
Cluster 2
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide8What is the intent?
All Pairs ToR Reachability Traffic must follow shortest path Intra-cluster path length = 2
Intra-datacenter path length = 4
8
R
1
R
2
R
3
R
4
D
1
D
2
D3
D
4
A1
A2A3
A4
B
1
B2
B3B4
ToR
1
ToR
2
ToR
4
ToR
3
Cluster 1
Cluster 2
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide9What is the intent?
All Pairs ToR Reachability Traffic must follow shortest path All Equal Cost Multi Paths (ECMP) must be available
9
R
1
R
2
R
3
R
4
D
1
D
2
D
3
D4
A1
A
2A3
A4
B1
B2
B
3B4
ToR
1
ToR
2
ToR
4
ToR
3
Cluster 1
Cluster 2
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide10Where does the intent come from?
R1
R2
R
3
R
4
D
1
D
2
D
3
D
4
A
1A
2
A3
A4
B1B2
B3
B
4
ToR
1
ToR
2
ToR
4
ToR
3
Automatic Intent Extraction
All pairs
ToR
reachability
Traffic must follow shortest path
ECMP redundancy
Network Graph Service
Topology
10
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide11Reality Checker for Datacenters (RCDC)
What is the Reality?
What is the Intent?
How to scale verification?
What do we do with the results?
11
Slide12Challenges
12
All pairs
ToR
reachability analysis is O(N
3
)
Anteater [Mai 2011]
HSA [
Kazemian
2012]
Veriflow
[
Kurshid
2013]
NetKat [Anderson 2014]
NoD [Lopes 2015]Symmetries [Plotkin 2016, Beckett 2018]
Composite FIB snapshot is a hard engineering problem
Libra [Zeng 2014]
Slide13Local Validation
Exploit Azure network’s regular structureEach router has a fixed role for a set of addressesEnough to verify role is enforced on each routerDecompose into
local contracts
13
R
1
R
2
R
3
R
4
D
1
D
2
D3D4
A
1
A2
A3A4
B1
B2
B
3
B4
ToR
1
ToR
2
ToR
4
ToR
3
Leaf
routers
Spine router
Backbone
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide14What are the contracts?
R1
R2
R
3
R
4
D
1
D
2
D
3
D
4
A
1A2
A3
A4
B1
B2B3
B4
ToR
1
ToR
2
ToR
4
ToR
3
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
ToR
1
Contracts
Prefix
NextHops
0/0
{A
1
, A
2
, A
3
, A
4
}
11.0.0.0/16
{A
1
, A
2
, A
3
, A
4
}
12.0.0.0/16
{A
1
, A
2
, A
3
, A
4
}
13.0.0.0/16
{A
1
, A
2
, A
3
, A
4
}
Specific contacts
Default contacts
Leaf
routers
Spine router
Backbone
14
Slide15What are the contracts?
R1
R2
R
3
R
4
D
1
D
2
D
3
D
4
A
1A2
A3
A4
B1
B2B3
B4
ToR
1
ToR
2
ToR
4
ToR
3
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Prefix
NextHops
0/0
{D
1
}
10.0.0.0/16
{ToR
1
}
11.0.0.0/16
{ToR
2
}
12.0.0.0/16
{D
1
}
13.0.0.0/16
{D
1
}
A
1
Contracts
Leaf
routers
Spine router
Backbone
15
Slide16What are the contracts?
R1
R2
R
3
R
4
D
1
D
2
D
3
D
4
A
1A2
A3
A4
B1
B2B3
B4
ToR
1
ToR
2
ToR
4
ToR
3
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Leaf
routers
Spine router
Backbone
16
Slide17Live Monitoring of Forwarding Behavior
R1
R
2
R
3
R
4
D
1
D
2
D
3
D
4
A1
A2
A3
A4
B1B2
B3
B4
ToR
1
ToR
2
ToR
4
ToR
3
Reachability
invariants
Network
Graph Service
Error Reports
Validation time for one datacenter < 3 minutes
17
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide18Realtime Checker for Datacenters (RCDC)
What is the Reality?
What is the Intent?
How to scale verification?
What do we do with the results?
18
Slide19Latent Error19
R
1
R
2
R
3
R
4
D
1
D
2
D
3
D
4A1
A2
A3
A4
B1B2
B3
B
4
ToR
1
ToR
2
ToR
4
ToR
3
Leaf
routers
Spine router
Backbone
Device Name
Prefix
Expected NextHops
Actual Prefix
Actual
NextHops
ToR
1
11.0.0.0/16
{ A
1
, A
2,
A
3
, A
4
}
0/0
{A
1
, A
2
}
ToR
2
10.0.0.0/16
{ A
1
, A
2,
A
3
, A
4
}
0/0
{A
1
, A
2
}
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide20Latent Errors20
R
1
R
2
R
3
R
4
D
1
D
2
D
3
D
4A1
A2
A3
A4
B1B2
B3
B
4
ToR
1
ToR
2
ToR
4
ToR
3
Leaf
routers
Spine router
Backbone
Device Name
Prefix
Expected NextHops
Actual Prefix
Actual
NextHops
A
1
0/0
{ D
1
}
A
2
0/0
{ D
2
}
A
3
0/0
{ D
3
}
10.0.0.0/16
11.0.0.0/16
12.0.0.0/16
13.0.0.0/16
Slide21What did we do about the errors?
21
O(100)
Risk Categorization
Role of device
No of additional faults required to cause an impact
Slide22Experience: Types of Errors
Software bugs
Software bug that caused rib-fib inconsistency
Hardware failures
Operationally down links
Operational
Drift
BGP Sessions that are shut
Migrations
Port channels not configured on T1s
Two T1 sets configured with the same ASN
22
Slide23Reliablity at Hyperscale
Is the network operating as expected?
Will my change affect the network?
23
Slide24Verifying Device Access-Control Lists (ACL)24
* 100.64.0.0/16 UDP deny
srcIp
dstIp
protocol action
* * * permit
* * * deny
Parsers
bit-vector logic formulas
Z3: Check
Contracts
SecGuru
Policy
Refactoring a Large Legacy ACL
25Several thousands linesIntent was poorly understoodDifficult to make changes
Edge ACL
Refactor
Few hundred lines
Move out service specific protections
Edge ACL
Slide26Refactoring a Large Legacy ACL
26
Regression
contracts
SecGuru
Fix errors in policy
Regression
contracts
SecGuru
Deploy refactored ACL
Regression
contracts
SecGuru
Deploy refactored ACL
SrcIp
= *
DstIp
= 10.0.0.0
/16
Allow
Contract expects:
SrcIp
= *
DstIp
= 10.0.0.0
/26
Allow
Policy only allows:
Slide27Refactoring a Large Legacy ACL
27
Slide28SummaryCaptured and checked intent in Azure Datacenters
Incorporated verification to monitor drift and check impact of changes.Optimized for hyper scale 28
Slide29More ChallengesWide area networks
Better abstractions for intentModel-based testing of device firmwareVerifying virtual network policies29
Contact
dmaltz@microsoft.com
karjay@microsoft.com
padhye@microsoft.com