Packet Scheduling Wei Bai Kai Chen Li Chen Changhoon Kim Haitao Wu ACM CoNEXT Irvine CA December 2016 1 Data Centers Around the World Googles worldwide DC map 2 Microsofts DC in ID: 529819
Download Presentation The PPT/PDF document "Enabling ECN over Generic" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Enabling ECN over Generic
Packet Scheduling
Wei Bai, Kai Chen, Li Chen, Changhoon Kim, Haitao Wu
ACM CoNEXT, Irvine, CA, December 2016
1Slide2
Data Centers Around the World
Google’s worldwide DC map
2
Microsoft’s DC in
Dublin, Ireland
Facebook DC interior
Global Microsoft Azure DC Footprint
ACM CoNEXT, Irvine, CA, December 2016Slide3
Inside the Data Center (DC)
Network requirements of applicationsDesire low latency for short messagesDesire
high throughput for large flows
ACM CoNEXT, Irvine, CA, December 2016
3Slide4
Inside the Data Center (DC)
Network requirements of applicationsDesire low latency for short messagesDesire
high throughput for large flowsNetwork performance improvementPacket schedulingECN-based transport protocols
ECN
=
E
xplicit
C
ongestion
N
otificationACM CoNEXT, Irvine, CA, December 2016
4
CombineSlide5
Packet Scheduling in Data Centers
5
Round Robin
Real-time Services
Best-effort Services
Background Services
4
2
1
Weight
Inter-Service Traffic Isolation
Bai et al. (NSDI’16)
ACM CoNEXT, Irvine, CA, December 2016Slide6
Packet Scheduling in Data Centers
6
Round Robin
(0, 100KB] Flows
(100KB, 10MB) Flows
(10MB,
) Flows
High
Medium
Low
Priority
Flow Scheduling
Bai et al. (NSDI’15)
Strict Priority
ACM CoNEXT, Irvine, CA, December 2016Slide7
Packet Scheduling in Data Centers
7
Round Robin
Existing fixed-function
switching chips
Strict Priority
ACM CoNEXT, Irvine, CA, December 2016Slide8
Packet Scheduling in Data Centers
8
Round Robin
Strict Priority
Future programmable
switching chips
ACM CoNEXT, Irvine, CA, December 2016Slide9
Packet Scheduling in Data Centers
9
Round Robin
Programmable
Schedulers
Push-In-First-Out (PIFO) Queue
A
Sivaraman
et al. (SIGCOMM’16)
Strict Priority
ACM CoNEXT, Irvine, CA, December 2016Slide10
Can we enable ECN for
arbitrary packet schedulers in data centers?
10ACM CoNEXT, Irvine, CA, December 2016Slide11
Packets get marked when queue length
ACM CoNEXT, Irvine, CA, December 2016
11
don’t mark
mark
ECN/RED without Packet SchedulingSlide12
ECN/RED without Packet Scheduling
Packets get marked when queue length
To achieve 100% throughput
ACM CoNEXT, Irvine, CA, December 2016
12
Buffer Occupancy
K
TimeSlide13
Packets get marked when queue length
To achieve 100% throughput
ACM CoNEXT, Irvine, CA, December 2016
13
Small number of concurrent large flows in DC
M
Alizadeh
et al. (SIGCOMM’10)
ECN/RED without Packet SchedulingSlide14
Packets get marked when queue length
To achieve 100% throughput
ACM CoNEXT, Irvine, CA, December 2016
14
Fixed link capacity
ECN/RED without Packet SchedulingSlide15
Packets get marked when queue length
To achieve 100% throughput
ACM CoNEXT, Irvine, CA, December 2016
15
Base round-trip time, relatively stable in DC
Wu et al. (CoNEXT’12)
ECN/RED without Packet SchedulingSlide16
Packets get marked when queue length
To achieve 100% throughput
ACM CoNEXT, Irvine, CA, December 2016
16
Determined by congestion control algorithms
ECN/RED without Packet SchedulingSlide17
Packets get marked when queue length
To achieve 100% throughput
ACM CoNEXT, Irvine, CA, December 2016
17
Standard queue length threshold
A static value in data center environment
ECN/RED without Packet SchedulingSlide18
Packets get marked when queue length
To achieve 100% throughput
ACM CoNEXT, Irvine, CA, December 2016
18
static
threshold:
Easy to configure at the switch
ECN/RED without Packet SchedulingSlide19
ECN/RED with Packet Scheduling
Each queue is a link with the varying
capacityIdeal ECN/RED solutionPackets should get marked if the length of queue i
ACM CoNEXT, Irvine, CA, December 2016
19
dynamic
per-queue threshold:
varying
capacity:
Slide20
ECN/RED with Packet Scheduling
Each queue is a link with the varying
capacityIdeal ECN/RED solutionPackets should get marked if the length of queue i
Not supported by current switching chips
Current practice
Configure static thresholds:
High throughput but
poor latency
ACM CoNEXT, Irvine, CA, December 2016
20Slide21
To Implement Ideal ECN/RED Solution
A general way to estimate the queue capacityQueue capacity = Queue departure rate when the queue keeps non-empty
Leverage the solution from PIE (HPSR’13)Start measurement when # of bytes in the switch buffer > dq_threshGet the rate to drain
dq_thresh bytes
ACM CoNEXT, Irvine, CA, December 2016
21Slide22
Trade-off of Measurement Window
ACM CoNEXT, Irvine, CA, December 201622
Sequence of packets
Link capacity: C
Transmitted packets from queue 1
Transmitted packets from queue 2
Queue 1 and 2 keep
non-empty
during the transmissionSlide23
Trade-off of Measurement Window
ACM CoNEXT, Irvine, CA, December 201623
Sequence of packets
Link capacity: C
Transmitted packets from queue 1
Transmitted packets from queue 2
Queue capacity 1 = Queue capacity 2 = 0.5CSlide24
Trade-off of Measurement Window
A too small measurement windowe.g., dq_thresh = 3MTUACM CoNEXT, Irvine, CA, December 2016
24
Sequence of packets
Link capacity: C
C
3/7 C
3/7 C
C
Sample rate of queue 1Slide25
Trade-off of Measurement Window
A too small measurement windowDegrade measurement accuracyACM CoNEXT, Irvine, CA, December 2016
25
Sequence of packets
Link capacity: C
C
3/7 C
3/7 C
C
Sample rate of queue 1Slide26
Trade-off of Measurement Window
A too small measurement windowDegrade measurement accuracyA too large measurement windowe.g, dq_thresh = 20MTU
ACM CoNEXT, Irvine, CA, December 201626
Sequence of packets
Link capacity: CSlide27
Trade-off of Measurement Window
A too small measurement windowDegrade measurement accuracyA too large measurement windowCannot efficiently capture the dynamic changes
ACM CoNEXT, Irvine, CA, December 201627
Sequence of packets
Link capacity: CSlide28
Trade-off of Measurement Window
A too small measurement windowDegrade measurement accuracyA too large measurement windowCannot efficiently capture the dynamic changes
ACM CoNEXT, Irvine, CA, December 201628
Rate measurement is non-trivialSlide29
Another View
Ideal ECN/RED solutionPackets should get marked if
ACM CoNEXT, Irvine, CA, December 2016
29
varying
capacity:
queue length:
Slide30
Another View
Ideal ECN/RED solutionPackets should get marked if
ACM CoNEXT, Irvine, CA, December 2016
30
varying
capacity:
sojourn time:
Slide31
TCN
TCN mechanismPackets should get marked if their sojourn times >
ACM CoNEXT, Irvine, CA, December 2016
31
T
ime-based
C
ongestion
N
otificationSlide32
TCN in Detail
Sojourn time measurementEnqueue: attach a metadata to each packet to store the enqueue time
ACM CoNEXT, Irvine, CA, December 201632
T
eqSlide33
TCN in Detail
Sojourn time measurementEnqueue: attach a metadata to each packet to store the enqueue time
Dequeue: calculate sojourn timeACM CoNEXT, Irvine, CA, December 2016
33
T
eq
sojourn time = now -
T
eqSlide34
TCN in Detail
Sojourn time measurementEnqueue: attach a metadata to each packet to store the enqueue time
Dequeue: calculate sojourn timeACM CoNEXT, Irvine, CA, December 2016
34
2B-long
metadata is enough for DCSlide35
TCN in Detail
Sojourn time measurementEnqueue: attach a metadata to each packet to store the
enqueue timeDequeue: calculate sojourn timeInstantaneous ECN markingCompare the per-packet instantaneous sojourn time with a
static threshold
ACM CoNEXT, Irvine, CA, December 2016
35
Stateless
Data Plane AlgorithmSlide36
TCN in Detail
Sojourn time measurementEnqueue: attach a metadata to each packet to store the
enqueue timeDequeue: calculate sojourn timeInstantaneous ECN markingCompare the per-packet instantaneous sojourn time with a
static threshold
Marking does not cause any
bubble
on the link
ACM CoNEXT, Irvine, CA, December 2016
36Slide37
TCN vs
CoDelAdvantages of TCNStateless: cheaper to implement in hardwareInstantaneous: faster reaction to busty traffic
ACM CoNEXT, Irvine, CA, December 201637Slide38
TCN vs
CoDelAdvantages of TCNStateless: cheaper to implement in hardwareInstantaneous: faster reaction to busty traffic
Simplicity of TCNACM CoNEXT, Irvine, CA, December 2016
38
Unique Characteristics of Data CentersSlide39
TCN vs
CoDelAdvantages of TCNStateless: cheaper to implement in hardwareInstantaneous: faster reaction to busty traffic
Simplicity of TCNSmall number of concurrent large flows Relatively stable RTTsPrior knowledge of transport at the end hostACM CoNEXT, Irvine, CA, December 2016
39Slide40
Testbed Evaluation
TCN software prototypeLinux qdisc kernel module on a multi-NIC serverTestbed setup9 servers are connected to a software switch
End-hosts use DCTCP as the transport protocolECN schemes comparedPer-queue RED with the standard thresholdCoDel
40Slide41
Static Flow Experiment
1 flow (500Mbps)
4 flows
41
1 flow
Q1
Q2
Q3
SP/WFQ
w=1 (low)
w=1 (low)
high prioSlide42
Static Flow Experiment
ACM CoNEXT, Irvine, CA, December 2016
42
TCN preserves the scheduling policySlide43
Dynamic Flow Experiment
43
high prio
w=1 (low)
w=1 (low)
w=1 (low)
w=1 (low)
8 senders to 1 receiver (web search workload)
SP/WFQ scheduling policy at the switch
SP/WFQ
(0, 100KB] flows of all services
(100KB,
) flows of service 1
(100KB,
) flows of service 2
(100KB,
) flows of service 3
(100KB,
) flows of service 4
TrafficSlide44
ACM CoNEXT, Irvine, CA, December 2016
4499th FCT of Small Flows (<100KB)
TCN maintains the low buffer occupancySlide45
ACM CoNEXT, Irvine, CA, December 2016
45Realistic Traffic: Large Flows (>10MB)
TCN achieves high throughputSlide46
Conclusion
TCN: a simple ECN solution for data centersUse sojourn time as the congestion signal (CoDel)Perform instantaneous ECN marking (DCTCP)Code: http://sing.cse.ust.hk/projects/TCNNext step: TCN in programmable hardware
ACM CoNEXT, Irvine, CA, December 201646Slide47
Thanks!
47Slide48
ACM CoNEXT, Irvine, CA, December 2016
48Average FCT of Small Flows (<100KB)