Colin Dixon Technical Steering Committee Chair OpenDaylight Distinguished Engineer Brocade Borrowed ideas and content from Jan Medved Moiz Raja and Tom Pantelis MultiProtocol SDN Controller Architecture ID: 503205
Download Presentation The PPT/PDF document "Clustering in OpenDaylight" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Clustering in OpenDaylight
Colin Dixon
Technical Steering Committee Chair, OpenDaylight
Distinguished Engineer, Brocade
Borrowed ideas
and content from
Jan
Medved
,
Moiz
Raja, and Tom
PantelisSlide2
Multi-Protocol SDN Controller Architecture
Service Adaptation Layer (SAL) / Core
OpenFlow
Netconf
Client
Network Devices
Applications
Network Devices
Network Devices
Applications
OSS/BSS, External Apps
OVSDB
Protocol Plugin
...
Application
Netconf
Server
RESTCONF
...
Application
Protocol Plugins/Adapters
Controller Core
Applications
RESTSlide3
Model
-Driven
SAL
(
MD-SAL)
Netconf
Client
Network Devices
Network Devices
Network Devices
Protocol Plugin
...
Netconf
Server
RESTCONF
Application
Application
REST
Applications
Applications
OSS/BSS, External Apps
Data
Store
RPCs
Notifications
Namespace
Data Change Notifications
Software Architecture
“Kernel”
Apps/ServicesSlide4
Data Store ShardingSelect data subtreesCurrently, can only pick a subtree
directly under the rootWorking on subtrees at arbitrary levelsMap subtrees onto shards
Map shards onto nodes
Data tree root
Shard1.1
Shard1.2
Shard1.3
Shard2.1
ShardN.1
...
...
Node1
Node2
NodeM
Shard Layout Algorithm:
Place shards on M nodes
...
ShardX.Y
:
X: Service X
Y: Shard Y within Service XSlide5
Shard ReplicationReplication using RAFT [1]Provides strong consistencyTransactional data access
snapshot readssnapshot writesread/write transactionsTolerates f failures with 2f+1 nodes
3 nodes => 1 failure, 5 => 2, etc.
Leaders handle all writes
Send to followers before committing
Leaders distributed to spread load
Node
NodeNode
L
L
L
L
L
F
F
F
F
F
F
F
F
F
F
[1]
https
://raftconsensus.github.io/Slide6
Strong ConsistencySerializabilityEveryone always reads the most recent write. Loosely “everyone is at the same point on the same timeline.”Causal ConsistencyLoosely “you won’t see the result of any event without seeing everything that could have caused that event.” [Typically in the confines of reads/writes to a single
datastore.]Eventual ConsistencyLoosely “everyone will eventually see the same events in some order, but not the necessarily the same order.” [Eventual in the same way that your kid will “eventually” clean their room.]Slide7
Why strong consistency mattersA flapping port generates events:port up, port down, port up, port down, port up, port down, …Is the port up or down?If they’re not ordered, you have no idea.
…sure, but these are events from a single device, so you can keep them ordered easilyMore complex examplesswitches (or ports on those switches) attached to different
nodes go up and down
If you don’t have ordering, different nodes will now come to different conclusions about reachability, paths, etc.Slide8
Why everyone doesn’t use itStrong consistency can’t work in the face of partitionsIf you’re network splits in two, either:one side stopsyou lose strong consistencyStrong consistency requires coordination
Effectively, you need some single entity to order everythingThis has performance costs, i.e., a single strongly consistent data store is limited by the performance of a single nodeQuestion is: do we start with strong and relax it or start weak and strengthen it?
OpenDaylight has started strongSlide9
Service/Application ModelLogically, each service or application (code) has a primary subtree (YANG model
) and shard it is associated with (data)One instance of the code is co-located with each replica of the data
All instances are stateless, all state is stored in the data store
The instance that is co-located with the shard leader
handles writes
Node1
S1.1
S3.1
S1.2
S2.9
S3.7
Service1
Node2
S1.1
S3.1
S1.2
S2.9
S3.7
Node3
S1.1
S3.1
S1.2S2.9
S3.7
Data Store API
Data Store API
Data Store API
Service2
Service3
SX.Y
Shard Leader replica
SX.Y
Shard Follower replica
Service1
Service3
Service2
Service3
Service2
Service1Slide10
Service/Application Model (cont’d)Entity Ownership Service allows for related tasks to be co-locatede.g., the tasks related to a given OpenFlow switch should happen where it’s connected
also handles HA/failover, automatic election of a new entity ownerRPCs and Notifications are directed to the entity ownerNew cluster-aware data change listeners provide integration into the data store
Node1
S1.1
S3.1
S1.2
S2.9
S3.7
Service1
Node2
S1.1
S3.1
S1.2
S2.9
S3.7
Node3
S1.1
S3.1
S1.2
S2.9
S3.7
Data Store API
Data Store API
Data Store API
Service2
Service3
SX.Y
Shard Leader replica
SX.Y
Shard Follower replica
Service1
Service3
Service2
Service3
Service2
Service1Slide11
Handling RPCs and Notifications in a ClusterData Change NotificationsFlexibly delivered to shard leader, or any subset of nodesYANG-modeled Notifications
Delivered to the node on which they were generatedTypically guided to entity ownerGlobal RPCsDelivered to node where calledRouted RPCs
Delivered to the node which registered to handle them
Node
Node
Node
L
L
L
L
L
F
F
F
F
F
F
F
F
F
FSlide12
Service/Shard Interactions
Service-x “resolves”
a read or write to a
subtree
/shard
Reads are sent to the leader
Working on allowing for local reads
Writes are send to the leader to be orderedNotifications changed data are sent to the shard leaderand to anyone registered for remote notification
Node1
S1.1
S3.1
S1.2
S2.9
S3.7
Node2
S1.1
S3.1
S1.2
S2.9
S3.7
Node3
S1.1S3.1
S1.2
S2.9
S3.7
Data Store API
Data Store API
Data Store API
SX.Y
Shard Leader replica
SX.Y
Shard Follower replica
1
“resolve”
“read”
Service1
Service-x
1
2.1
4
2
3
2
3
“write”
Service-y
4.1
“notification”
“notification”Slide13
Major additions in BerylliumEntity Ownership ServiceEntityOwnershipServiceClustered Data Change
ListenersClusteredDataChangeListener and ClusteredDataTreeChangeListener
Singificant
application/plugin adoption
OVSDB
OpenFlowNETCONFNeutron
Etc.Slide14
Work in progressDynamic, multi-level shardingMulti-level, e.g., OpenFlow should be able to say it’s subtrees
start at the “switch” nodeDynamic, e.g., an OpenFlow subtree should be moved if the connection moves
Improve performance, scale, stability, etc. as always
Faster, but “stale”, reads from local replica vs. always reading
from leader
Pipelined transactions for better cluster write throughput
Whatever else you’re interested in helping withSlide15
Longer term thingsHelper code for building common app patternsrun once in the cluster and fail-over if that node goes downrun everywhere and handle things
Different consistency modelsGiving people the knob is easyDealing with the ramifications is hardFederated/hierarchical clustering