The Story T hus Far SDN centralize the networks control plane The controller is effectively the brain of the network Controller determines what to do and tell switches how to do it The Story Thus Far ID: 603180
Download Presentation The PPT/PDF document "SDN Controller Challenges" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
SDN Controller ChallengesSlide2
The Story T
hus Far
SDN --- centralize the network’s control plane
The controller is effectively the brain of the networkController determines what to do and tell switches how to do it.Slide3
The Story Thus FarSlide4
The Story Thus Far
Something Happened!!!!Slide5
The Story Thus Far
Let’s Ask the Brian!!!!Slide6
The Story Thus Far
Think about what happen…
Maybe come up with a solutionSlide7
The Story Thus Far
Controller runs control function
Control function creates switch state
F(global network state)
Switch state Global network state can be graph of the network
Tell the network what to doSlide8
Challenges with Centralization
Single point of failure
Fault tolerance
Performance bottleneckScalability
Efficiency (switch-controller latency)Single point for security violationsSlide9
Motivation for Distributed Controllers
Wide-Area-Network
Wide distribution of switches: from USA to Australia.
High latency between one controller and All switches
Application + Network growthHigher CPU load for controllerMore memory for storing FIB entries and calculations
High availabilitySlide10
Class Outline
Fault Tolerance
Google’s B4 paper
Controller ScalabilityWays to scale the controllerDistributed controllers: Mesh Versus Hierarchy
Implications of controller placementSlide11
Fault ToleranceSlide12
Google’s B4 Network
Provides connectivity between DC sites
Uses SDN to control edge switches
Goal: high utilization of linksInsight: fine-grained control over edge and network can lead to higher utilizationDistributed Controllers
One set of controllers for each Data center (site)Slide13
Google’s B4 Network
Provides connectivity between DC sites
Uses SDN to control edge switches
Goal: high utilization of linksDistributed ControllersOne set of controllers for each Data center (site)Slide14
Fault Tolerance in B4
Each site runs a set of controller
Paxos
is run between controllers in a site to determine masterSlide15
Quick Overview of Paxos
Given N controllers
1
Acts as leader, and N-1 as workersAll N controller maintain the same state
Switches interact with leaderChange doesn’t happen until whole group agreesFailure of primaryN-1 work together to elect a new leader(determine new leader)
Network
Events
Propagate
State changesSlide16
Pros-Cons of Paxos
Pros
Well understood and studied; gives good FTMany implementations in the wildE.g. Zookeeper
ConsTime to recoverImpacts through of the put of the entire systemSlide17
Controller ScalabilitySlide18
What limits a controller’s scalability?
Number of control messages from switch
Depends on the application logic
E.g. MicroTE/Hedera
periodically query all switches for statsReactive controller, evaluated in NoX, requires each switch to send messages for a new flow
Packet-in (if reactive Apps)Flow stats,
Flow_time-outsSlide19
What limits a controller’s scalability?
Application processing overhead
The controller runs a bunch of application
Similar to: A server running a set of programsCPU/Memory constraint limit how the app runsSlide20
What limits a controller’s scalability?
Distance between controller and the switches
Controller 1
Hedera
L3
FWSlide21
How to Scale the Controller.
Obvious: add more controllers.
BUT: how about the applications?
Synchronization/concurrency problems. Who controls which switch?
Who reacts to which events?
Controller 1
Hedera
L3
FW
Controller 2
Hedera
L3
FW
Controller N
Hedera
L3
FW
?
?
Stats + Install OF entriesSlide22
Medium Sized Networks
Assumption:
controller can’t store all forwarding table entries in memory
But can process all events and run all appsEach controller
Get same network events+ running same app. same outputBut store output for only a fraction and
config only a fraction
Controller 1
Hedera
L3
FW
Controller 2
Hedera
L3
FW
Controller N
Hedera
L3
FW
Stats + Install OF entriesSlide23
Medium Sized Networks:
hyperflow
Each controller
Push state to each controllerEach controller things it’s the only one in the network
Controller 1
Hedera
L3
FW
Controller 2
Hedera
L3
FW
Controller N
Hedera
L3
FW
Stats + Install OF entries
Sub-subscribe
ssytemSlide24
Large Sized Networks
Assumptions
Each controller can’t store all the FIB entries
Each controller can’t run the entire application or handle eventsNeed to partition the application
But how?Slide25
Application partition 1
Approach 1: each controller runs a specific application
How do your resolve conflicts in FW entries
Apps can conflict in the rules they install
Controller 1
Hedera
Controller 2
L3
Controller N
FWSlide26
Application partition 2
Approach 2: all controllers run the same application but for a subset of devices
Results in a Distributed Mesh control plane
Controller 1
Hedera
L3
FW
Controller 2
Hedera
L3
FW
Controller N
Hedera
L3
FW
Abstract
Network viewSlide27
Application Partition 2
A
bstract view exchanged with each other
Abstract view reduces the n/w information used by each controller
Controller 2
Hedera
L3
FW
REAL NETWORK
Controller 2’s View of NETWORK
Abstraction
Provided by
Controller 1
Abstraction
Provided by
Controller NSlide28
How to Deal with State + Concurrency Issues?
Controllers synchronize through a DB or DHT
So each app needs synchronization code.
How do you deal with concurrency.Each switch has a table/Row in a DB.
Table/Row reflects switch stateProgrammer interacts directly with the DBOnix
takes care of synch between DB and switchSlide29
ONIX to the SDN Programmer
How
to synchronize between domains.
How many domains? Or controllers?How many switches in a domain?Slide30
Application partition 3
Approach 3: divide application into local, and global.
Results in a hierarchical control plane
Global Controller and Local ControllersApplications that do not need network-wide state
Can be run locally without communicate with other controllersSlide31
Are Hierarchical Controllers Feasible
Examples of local applications:
Link Discovery, Learning switch, local policies
Examples of local portions of a global algo
Data center Traffic engineeringElephant flow detection (hedera)
Predictability detection (MicroTE)
Local apps/controllers have other benefitsHigh parallelismCan be run closer to the devices.Slide32
Kandoo: Hierarchical controllers
Controller 1
Hedera
L3
FW
Controller 2
Hedera
L3
FW
Controller N
Hedera
L3
FW
Global Controller
Hedera
2 levels of controllers: global and local
Local applications are embarrassingly parallel
Local shields global from network eventsSlide33
Kandoo: Hierarchical controllers
Controller 1
Hedera
L3
FW
Controller 2
Hedera
L3
FW
Controller N
Hedera
L3
FW
Global Controller
Hedera
Local Controllers: run local apps
Returns abstract view to the global controller
Reduces # events sent to global and reduce size of network seen by Slide34
Kandoo: Hierarchical controllers
Controller 1
Hedera
L3
FW
Controller 2
Hedera
L3
FW
Controller N
Hedera
L3
FW
Global Controller
Hedera
Global Controllers
Runs global apps: AKA apps that need network wide stateSlide35
Hedera Reminder
Goal: reduce network contention
Insight: contention happens when elephants share paths.
Solution:Detect Elephant flowsPlace Elephant flows on different flowsSlide36
Implementing Hedera
in
Onix
Controller 1
Hedera
:
detection +placement
Controller 2
Hedera
:
detection+placement
2 levels of controllers: global and local
Local applications are embarrassingly parallel
Local shields global from network events
Stats
Stats
Flow
Table
entries
Flow
Table
entries
Exchange
TM+detectionSlide37
Implementing Hedera
in
Kandoo
Controller 1
Elephant detection
Controller 2
Controller N
Global Controller
Hedera
: Global placement
Local Controllers: get stats from networks + elephant detection
Global Controller: decide flow placement + flow installation
Elephant detection
Elephant detection
Inform of
elephant flows
Stats
Install new flow table entriesSlide38
Implementing B4 in
Kandoo
like architecture
Site Controller
Elephant detection
Site Controller 2
Site Controller N
Global Controller
TE+BW allocator
Local Controllers: get stats from networks + determines demand
Global Controller: calculate paths for traffic
Elephant detection
Elephant detection
Install TE Ops
Stats + Install OF entries
TE DB
Inform of Flow demandsSlide39
Kandoo to the SDN Programmer
Think of what is local and what is global
When apps are written, annotate with local flag
Kandoo will automatically place local
And place global.Kandoo restricts messages between global and local controllers
You can’t send OF styles messages Must send Kandoo
style messagesSlide40
Summary
Centralization provide simplicity at the cost of reliability and scalability
Replication can improve reliability and scalability
For Reliability, Paxos is an option
For Scalability, conqueror and divide Partition the applicationsKandoo: Local apps and global appsPartition the network
Onix: each controller controls a subset of switches (Domain)