With Radhika Niranjan Mysore Malveeka Tewari Ying Zhang Ericsson Research Keith Marzullo Amin Vahdat Meg Walraed Sullivan University of California San Diego Group of entities that want to communicate ID: 531161
Download Presentation The PPT/PDF document "Scalable Label Assignment in Data Center..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Scalable Label Assignment in Data Center Networks
With: Radhika Niranjan Mysore, Malveeka Tewari, Ying Zhang (Ericsson Research), Keith Marzullo, Amin Vahdat
Meg Walraed-
Sullivan
University
of California, San
DiegoSlide2
Group of entities that want to communicateNeed a way to refer to one anotherHistorically, a common problemE.g. laptop has two labels (MAC address, IP address)Labeling in data center networks is unique
Labeling in Distributed Networks
Phone system
Snail mail
Internet
Wireless networks
2Slide3
Interconnect of switches connecting hostsMassive in scale: 10k switches, 100k hosts, millions of VMsData Center Network Size
3Slide4
Designed with regular, symmetric structureOften multi-rooted trees (e.g. fat tree)Data Center Network Structure
Reality doesn’t always match the blueprint
Components and partitions are added/removed
Links/switches/hosts fail and recover
Cables are connected incorrectly
4Slide5
What gets labeled in a data center network?Switch portsHost NICsVirtual machines at hostsEtc.Labels in Data Center Networks
5Slide6
Flat AddressingE.g. MAC Addresses (Layer 2)UniqueAutomatic
Scalability:Switches have limited forwarding entries (say, 10k)# Labels in forwarding tables = # NodesData Center Labeling Techniques6Slide7
Hierarchical AddressingE.g. IP Addresses (Layer 3) with DHCPScalable forwarding state # Labels in forwarding tables < # NodesRelies on manual configuration:
Unrealistic at scaleData Center Labeling Techniques7Slide8
PortLand’s LDP: Location Discovery ProtocolDAC: Data center Address ConfigurationManual configuration via blueprintsRely on centralized controlCannot directly connect controller to all nodesRequires separate out-of-band control network or flooding techniques
Combining L2 and L3 Benefits8
PortLand: A Scalable
Fault-Tolerance Layer 2 Data Center Network Fabric
.
Niranjan Mysore et
al
.
SIGCOMM 2009
Generic and Automatic Address Configuration for Data Center Networks.
Chen et al.
SIGCOMM 2010Slide9
Scalability vs. Management
Network Size
Label Assignment Management Overhead
Ethernet
IP
Target location
Hardware Limit:
Need Labels < Nodes
Flat Labels
Structured Labels
Automation
9Slide10
Cost of AutomationLess management means more automationStructured labels encode topologyLabels change with topology dynamics
Network Size
Management Overhead
Ethernet
IP
Target
10Slide11
ALIAS OverviewALIAS: topology discovery and label assignment in hierarchical networksApproach: Automatic, decentralized
assignment of hierarchical labelsBenefits:Scalability (structured labels, shared label prefixes)Low management overhead (automation)No out-of-band control network (decentralized)11Slide12
Systems (Implementation/Evaluation)ALIAS Evolution
Theory (Proof/Protocol Derivation)
ALIAS: Scalable, Decentralized Label Assignment for Data Centers.
M. Walraed-Sullivan
, R. Niranjan Mysore, M. Tewari, Y. Zhang, K. Marzullo, A. Vahdat.
SOCC 2011
Brief Announcement: A Randomized Algorithm for Label Assignment in Dynamic Networks.
M. Walraed-Sullivan
, R. Niranjan Mysore, K. Marzullo, A. Vahdat.
DISC 2011
ALIAS:
topology discovery and label assignment in hierarchical networks
12Slide13
Multi-rooted treesMulti-stage switch fabric connecting hostsIndirect hierarchyMay allow peer linksLabels ultimately used for communicationMultiple paths between nodes
Data Center Network Topologies
13Slide14
Switches and hosts have labelsLabels encode (shortest physical) paths from the root of the hierarchy to a switch/hostEach switch/host may have multiple labelsLabels encode location and expose path multiplicityALIAS Labels
h’s Labels
a
d
g
h
b
e
g
h
b
f
g
h
c
f
g
h
a
d
g
b
e
g
b
f
g
c
f
g
g’s Labels
b
d
e
g
f
c
a
h
14Slide15
Hierarchical routing leverages this infoPush packets upward, downward path is explicitCommunication over ALIAS Labels
h’s Labels
a
d
g
h
b
e
g
h
b
f
g
h
c
f
g
h
a
d
g
b
e
g
b
f
g
c
f
g
g’s Labels
b
d
e
g
f
c
a
h
15Slide16
ContinuouslyOverlay appropriate hierarchy on network fabricGroup sets of related switches into hypernodesAssign coordinates to switches
Combine coordinates to form labelsPeriodic state exchange between immediate neighborsDistributed Protocol Overview16Slide17
Switches are at levels 1 through nHosts are at level 0Step 1. Overlay Hierarchy
Only requires 1 host to begin
Level 0
Level 1
Level 2
Level 3
17Slide18
ContinuouslyOverlay appropriate hierarchy on network fabricGroup sets of related switches into hypernodesAssign coordinates
to switchesCombine coordinates to form labelsDistributed Protocol Overview18Slide19
Labels encode paths from a root to a hostMultiple paths lead to multiple labels per hostAggregate for label compactionLocate switches that reach same hostsStep 2. Discover Hypernodes
Level 1
Level 2
Level 3
Level 4
(hosts omitted for space)
19Slide20
Step 2. Discover HypernodesHypernode (HN):
Maximal set of switches that connect to same HNs below (via any member)
Level 1
Level 2
Level 3
Level 4
Hypernode members are indistinguishable on downward path from root
Base Case:
Each Level 1 switch is in its own hypernode
20Slide21
ContinuouslyOverlay appropriate hierarchy on network fabricGroup sets of related switches into hypernodes
Assign coordinates to switchesCombine coordinates to form labelsDistributed Protocol Overview21Slide22
Coordinates combine to make up labelsLabels used to route downwardsStep 3. Assign Coordinates22
Switches in a HN share a coordinate
HN’s with a parent in common need distinct coordinatesSlide23
Step 3. Assign Coordinates23
choosers
deciders
Can we make this problem simpler?
Switches in a
HN share a coordinate
HN’s with a parent in common need distinct coordinatesSlide24
To assign coordinates to hypernodes:Define abstraction (choosers/deciders)Design solution for abstractionApply solution throughout multi-rooted tree
Step 3. Assign Coordinates24
choosers
decidersSlide25
Label Selection Problem (LSP)Chooser processes connected to Decider processesIn a bipartite graph
Step 3. Assign Coordinates
a. Decider/Chooser
abstraction
d
2
d
3
d
1
d
4
c
1
c
2
c
3
c
4
c
5
c
6
Choosers
(hypernodes)
deciders
(parent switches)
25Slide26
Label Selection Problem Goals:All choosers eventually select coordinatesChoosers sharing a decider have distinct coordinatesStep 3. Assign Coordinates
d
2
d
3
d
1
d
4
c
1
c
2
c
3
c
4
c
5
c
6
choosers
deciders
x
y
z
y
q
z
z
x
Multiple instances of LSP
Per-instance coordinates
y
z
26
a. Decider/Chooser
abstractionSlide27
Label Selection Problem (LSP)Difficulty: connections can change over timeStep 3. Assign Coordinates
d
2
d
3
d
1
d
4
c
1
c
2
c
3
c
4
c
5
c
6
x
y
z
y
q
z
z
x
z
r
27
a. Decider/Chooser
abstractionSlide28
Decider/Chooser Protocol (DCP)Distributed algorithm that implements LSPLas-Vegas style randomized algorithmProbabilistically fast, guaranteed to be correctPractical: Low message overhead, quick convergenceReacts quickly and locally to topology dynamicsTransient startup conditionsMiswirings
Failure/recovery, connectivity changesStep 3. Assign Coordinates
b
. Design Solution for Abstraction
28Slide29
c2:y?
c1:x?
c
2
:y?
c1:x?Algorithm:Choosers select coordinates randomly and send to decidersDeciders reply with [yes] or [no+hints]One no
reselect,
All yeses
finished
Step 3. Assign Coordinates
b
. Design Solution for Abstraction
d
2
d
1
c
1
c
2
c
1
:
c
2
:
c
1
:
c
2
:
c
1
: x
c
2
: y
c
1
: x
c
2
: y
yes
yes
yes
yes
Coord: x
Coord: y
29Slide30
Hypernodes are choosers for their coordinatesSwitches are deciders for neighbors belowStep 3. Assign Coordinates
c. Apply
DCP through Hierarchy
30
2 choosers
3 deciders
2 choosers
1 decider
3 choosers
3 decidersSlide31
DCP assigns level 1 coordinates
Step 3. Assign Coordinates
3 choosers
3 deciders
31
c
. Apply
DCP through HierarchySlide32
DCP for upper levels:
HN switches cooperate (per-parent restrictions)
Not directly connected
Step 3. Assign Coordinates
2 choosers
3 deciders
32
c
. Apply
DCP through Hierarchy
Communicate via shared L1 switch
“Distributed
-Chooser DCP”Slide33
ContinuouslyOverlay appropriate hierarchy on network fabricGroup related switches into hypernodes
Assign per-hypernode coordinatesCombine coordinates to form labelsDistributed Protocol Overview
33Slide34
Concatenate coordinates from root downwardStep 4. Assign Labels
(For clarity, assume labels same across instances of LSP)
34Slide35
Hypernodes create clusters of hosts that share label prefixesStep 4. Assign Labels35Slide36
Topology changes may cause paths to change
Which causes labels to change
Evaluation:
Quick convergence
Localized effects
Relabeling
36Slide37
Many overlying communication protocolsHierarchical-style forwarding makes most senseE.g. MAC address rewritingAt sender’s ingress switch: dest. MAC ALIAS labelAt recipient’s egress switch: ALIAS labeldest. MAC
Up*/down* forwarding (AutoNet, SOSP91)Proxy ARP for resolutionE.g. encapsulation, tunnelingUsing ALIAS labels37Slide38
“Standard” systems approachImplementation, experimentation, deploymentTheoretical approachProof, formalization, verification via model checkingGoal: Verify correctness, feasibilityAssess scalability
Evaluation Methodology38Slide39
Does ALIAS assign labels correctly?Do labels enable scalable communication?Implemented in Mace (www.macesystems.org)Used Mace Model Checker to verifyLabel assignment: levels, hypernodes, coordinates
Sample overlying communication: pairs of nodes can communicate when physically connectedPorted to small testbed with existing communication protocol for realistic evaluationEvaluation: Correctness
39Slide40
Does DCP solve the Label Selection Problem?Proof that DCP implements LSPImplemented in Mace and model checked all versions of DCPIs LSP a reasonable abstraction?
Formal protocol derivation from basic DCPALIASEvaluation: Correctness
40Slide41
Is overhead (storage, control) acceptable?Resource requirements of algorithmMemory: ~KBs for 10k host network Control overhead: agility/overhead tradeoff
Memory usage on testbed deployment (<150B)Evaluation: Feasibility
41
Ports/Switch
Hosts
Cycle (ms)Control Overhead (Mbps, %10G link)6465k
100
31.5 (0.3%)
500
6.29 (0.06%)
128
524k
1000
25.16 (0.25%)
2000
12.58 (0.12%)Slide42
Is the protocol practical in convergence time?DCP: Used Mace simulator to verify that “probabilistically fast” is quite fast in practiceMeasured convergence on tested deploymentOn startupAfter failure (speed and locality)
Used Mace model checker to verify locality of failure reactions for larger networksEvaluation: Feasibility42Slide43
Does ALIAS scale to data center sizes?Used Mace model checker to verify labels and communication for larger networks than testbedWrote simulation code to analyze network behavior for enormous networks
Evaluation: Scalability43Slide44
Result: Small Forwarding StateTopology
ALIAS Forwarding Table EntriesLevelsPorts% Fully Provisioned
Servers
3
32
1008,1924580262
50
173
20
86
64
100
65,653
90
80
1028
50
653
20
291
4
32
100
131,072
46
80
1278
50
2079
20
2415
5
16
100
65,653
23
80
492
50
886
20
1108
44
e.g. MAC
e.g. IP, LDP/DACSlide45
Scale and complexity of data center networks make labeling problem uniqueALIAS enables scalable data center communication by:Using a distributed approachLeveraging hierarchy to form topologically significant labelsEliminating manual configuration
Conclusion45Slide46
46Convergence of DCPSlide47
Convergence vs. Coord. Domain47Slide48
Convergence vs. Coord. Domain48Slide49
Convergence vs. Coord. Domain49Slide50
Convergence vs. Coord. Domain50