/
Enabling ECN over Generic Enabling ECN over Generic

Enabling ECN over Generic - PowerPoint Presentation

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
381 views
Uploaded On 2017-03-27

Enabling ECN over Generic - PPT Presentation

Packet Scheduling Wei Bai Kai Chen Li Chen Changhoon Kim Haitao Wu ACM CoNEXT Irvine CA December 2016 1 Data Centers Around the World Googles worldwide DC map 2 Microsofts DC in ID: 529819

irvine december 2016 conext december irvine conext 2016 acm queue packet ecn measurement scheduling packets tcn red time flows sojourn capacity data

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Enabling ECN over Generic" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Enabling ECN over Generic

Packet Scheduling

Wei Bai, Kai Chen, Li Chen, Changhoon Kim, Haitao Wu

ACM CoNEXT, Irvine, CA, December 2016

1Slide2

Data Centers Around the World

Google’s worldwide DC map

2

Microsoft’s DC in

Dublin, Ireland

Facebook DC interior

Global Microsoft Azure DC Footprint

ACM CoNEXT, Irvine, CA, December 2016Slide3

Inside the Data Center (DC)

Network requirements of applicationsDesire low latency for short messagesDesire

high throughput for large flows

ACM CoNEXT, Irvine, CA, December 2016

3Slide4

Inside the Data Center (DC)

Network requirements of applicationsDesire low latency for short messagesDesire

high throughput for large flowsNetwork performance improvementPacket schedulingECN-based transport protocols

ECN

=

E

xplicit

C

ongestion

N

otificationACM CoNEXT, Irvine, CA, December 2016

4

CombineSlide5

Packet Scheduling in Data Centers

5

Round Robin

Real-time Services

Best-effort Services

Background Services

4

2

1

Weight

Inter-Service Traffic Isolation

Bai et al. (NSDI’16)

ACM CoNEXT, Irvine, CA, December 2016Slide6

Packet Scheduling in Data Centers

6

Round Robin

(0, 100KB] Flows

(100KB, 10MB) Flows

(10MB,

) Flows

 

High

Medium

Low

Priority

Flow Scheduling

Bai et al. (NSDI’15)

Strict Priority

ACM CoNEXT, Irvine, CA, December 2016Slide7

Packet Scheduling in Data Centers

7

Round Robin

Existing fixed-function

switching chips

Strict Priority

ACM CoNEXT, Irvine, CA, December 2016Slide8

Packet Scheduling in Data Centers

8

Round Robin

Strict Priority

Future programmable

switching chips

ACM CoNEXT, Irvine, CA, December 2016Slide9

Packet Scheduling in Data Centers

9

Round Robin

Programmable

Schedulers

Push-In-First-Out (PIFO) Queue

A

Sivaraman

et al. (SIGCOMM’16)

Strict Priority

ACM CoNEXT, Irvine, CA, December 2016Slide10

Can we enable ECN for

arbitrary packet schedulers in data centers?

10ACM CoNEXT, Irvine, CA, December 2016Slide11

Packets get marked when queue length

 

ACM CoNEXT, Irvine, CA, December 2016

11

 

don’t mark

mark

ECN/RED without Packet SchedulingSlide12

ECN/RED without Packet Scheduling

Packets get marked when queue length

To achieve 100% throughput

 

ACM CoNEXT, Irvine, CA, December 2016

12

Buffer Occupancy

K

TimeSlide13

Packets get marked when queue length

To achieve 100% throughput

 

ACM CoNEXT, Irvine, CA, December 2016

13

 

Small number of concurrent large flows in DC

M

Alizadeh

et al. (SIGCOMM’10)

ECN/RED without Packet SchedulingSlide14

Packets get marked when queue length

To achieve 100% throughput

 

ACM CoNEXT, Irvine, CA, December 2016

14

 

Fixed link capacity

ECN/RED without Packet SchedulingSlide15

Packets get marked when queue length

To achieve 100% throughput

 

ACM CoNEXT, Irvine, CA, December 2016

15

 

Base round-trip time, relatively stable in DC

Wu et al. (CoNEXT’12)

ECN/RED without Packet SchedulingSlide16

Packets get marked when queue length

To achieve 100% throughput

 

ACM CoNEXT, Irvine, CA, December 2016

16

 

Determined by congestion control algorithms

ECN/RED without Packet SchedulingSlide17

Packets get marked when queue length

To achieve 100% throughput

 

ACM CoNEXT, Irvine, CA, December 2016

17

 

Standard queue length threshold

A static value in data center environment

ECN/RED without Packet SchedulingSlide18

Packets get marked when queue length

To achieve 100% throughput

 

ACM CoNEXT, Irvine, CA, December 2016

18

static

threshold:

 

Easy to configure at the switch

ECN/RED without Packet SchedulingSlide19

ECN/RED with Packet Scheduling

Each queue is a link with the varying

capacityIdeal ECN/RED solutionPackets should get marked if the length of queue i

 

ACM CoNEXT, Irvine, CA, December 2016

19

dynamic

per-queue threshold:

 

varying

capacity:

 Slide20

ECN/RED with Packet Scheduling

Each queue is a link with the varying

capacityIdeal ECN/RED solutionPackets should get marked if the length of queue i

Not supported by current switching chips

Current practice

Configure static thresholds:

High throughput but

poor latency

 

ACM CoNEXT, Irvine, CA, December 2016

20Slide21

To Implement Ideal ECN/RED Solution

A general way to estimate the queue capacityQueue capacity = Queue departure rate when the queue keeps non-empty

Leverage the solution from PIE (HPSR’13)Start measurement when # of bytes in the switch buffer > dq_threshGet the rate to drain

dq_thresh bytes

ACM CoNEXT, Irvine, CA, December 2016

21Slide22

Trade-off of Measurement Window

ACM CoNEXT, Irvine, CA, December 201622

Sequence of packets

Link capacity: C

Transmitted packets from queue 1

Transmitted packets from queue 2

Queue 1 and 2 keep

non-empty

during the transmissionSlide23

Trade-off of Measurement Window

ACM CoNEXT, Irvine, CA, December 201623

Sequence of packets

Link capacity: C

Transmitted packets from queue 1

Transmitted packets from queue 2

Queue capacity 1 = Queue capacity 2 = 0.5CSlide24

Trade-off of Measurement Window

A too small measurement windowe.g., dq_thresh = 3MTUACM CoNEXT, Irvine, CA, December 2016

24

Sequence of packets

Link capacity: C

C

3/7 C

3/7 C

C

Sample rate of queue 1Slide25

Trade-off of Measurement Window

A too small measurement windowDegrade measurement accuracyACM CoNEXT, Irvine, CA, December 2016

25

Sequence of packets

Link capacity: C

C

3/7 C

3/7 C

C

Sample rate of queue 1Slide26

Trade-off of Measurement Window

A too small measurement windowDegrade measurement accuracyA too large measurement windowe.g, dq_thresh = 20MTU

ACM CoNEXT, Irvine, CA, December 201626

Sequence of packets

Link capacity: CSlide27

Trade-off of Measurement Window

A too small measurement windowDegrade measurement accuracyA too large measurement windowCannot efficiently capture the dynamic changes

ACM CoNEXT, Irvine, CA, December 201627

Sequence of packets

Link capacity: CSlide28

Trade-off of Measurement Window

A too small measurement windowDegrade measurement accuracyA too large measurement windowCannot efficiently capture the dynamic changes

ACM CoNEXT, Irvine, CA, December 201628

Rate measurement is non-trivialSlide29

Another View

Ideal ECN/RED solutionPackets should get marked if

 

ACM CoNEXT, Irvine, CA, December 2016

29

varying

capacity:

 

queue length:

 Slide30

Another View

Ideal ECN/RED solutionPackets should get marked if

 

ACM CoNEXT, Irvine, CA, December 2016

30

varying

capacity:

 

sojourn time:

 Slide31

TCN

TCN mechanismPackets should get marked if their sojourn times >

 

ACM CoNEXT, Irvine, CA, December 2016

31

T

ime-based

C

ongestion

N

otificationSlide32

TCN in Detail

Sojourn time measurementEnqueue: attach a metadata to each packet to store the enqueue time

ACM CoNEXT, Irvine, CA, December 201632

T

eqSlide33

TCN in Detail

Sojourn time measurementEnqueue: attach a metadata to each packet to store the enqueue time

Dequeue: calculate sojourn timeACM CoNEXT, Irvine, CA, December 2016

33

T

eq

sojourn time = now -

T

eqSlide34

TCN in Detail

Sojourn time measurementEnqueue: attach a metadata to each packet to store the enqueue time

Dequeue: calculate sojourn timeACM CoNEXT, Irvine, CA, December 2016

34

2B-long

metadata is enough for DCSlide35

TCN in Detail

Sojourn time measurementEnqueue: attach a metadata to each packet to store the

enqueue timeDequeue: calculate sojourn timeInstantaneous ECN markingCompare the per-packet instantaneous sojourn time with a

static threshold

 

ACM CoNEXT, Irvine, CA, December 2016

35

Stateless

Data Plane AlgorithmSlide36

TCN in Detail

Sojourn time measurementEnqueue: attach a metadata to each packet to store the

enqueue timeDequeue: calculate sojourn timeInstantaneous ECN markingCompare the per-packet instantaneous sojourn time with a

static threshold

Marking does not cause any

bubble

on the link

 

ACM CoNEXT, Irvine, CA, December 2016

36Slide37

TCN vs

CoDelAdvantages of TCNStateless: cheaper to implement in hardwareInstantaneous: faster reaction to busty traffic

ACM CoNEXT, Irvine, CA, December 201637Slide38

TCN vs

CoDelAdvantages of TCNStateless: cheaper to implement in hardwareInstantaneous: faster reaction to busty traffic

Simplicity of TCNACM CoNEXT, Irvine, CA, December 2016

38

Unique Characteristics of Data CentersSlide39

TCN vs

CoDelAdvantages of TCNStateless: cheaper to implement in hardwareInstantaneous: faster reaction to busty traffic

Simplicity of TCNSmall number of concurrent large flows Relatively stable RTTsPrior knowledge of transport at the end hostACM CoNEXT, Irvine, CA, December 2016

39Slide40

Testbed Evaluation

TCN software prototypeLinux qdisc kernel module on a multi-NIC serverTestbed setup9 servers are connected to a software switch

End-hosts use DCTCP as the transport protocolECN schemes comparedPer-queue RED with the standard thresholdCoDel

40Slide41

Static Flow Experiment

1 flow (500Mbps)

4 flows

41

1 flow

Q1

Q2

Q3

SP/WFQ

w=1 (low)

w=1 (low)

high prioSlide42

Static Flow Experiment

ACM CoNEXT, Irvine, CA, December 2016

42

TCN preserves the scheduling policySlide43

Dynamic Flow Experiment

43

high prio

w=1 (low)

w=1 (low)

w=1 (low)

w=1 (low)

8 senders to 1 receiver (web search workload)

SP/WFQ scheduling policy at the switch

SP/WFQ

(0, 100KB] flows of all services

(100KB,

) flows of service 1

 

(100KB,

) flows of service 2

 

(100KB,

) flows of service 3

 

(100KB,

) flows of service 4

 

TrafficSlide44

ACM CoNEXT, Irvine, CA, December 2016

4499th FCT of Small Flows (<100KB)

TCN maintains the low buffer occupancySlide45

ACM CoNEXT, Irvine, CA, December 2016

45Realistic Traffic: Large Flows (>10MB)

TCN achieves high throughputSlide46

Conclusion

TCN: a simple ECN solution for data centersUse sojourn time as the congestion signal (CoDel)Perform instantaneous ECN marking (DCTCP)Code: http://sing.cse.ust.hk/projects/TCNNext step: TCN in programmable hardware

ACM CoNEXT, Irvine, CA, December 201646Slide47

Thanks!

47Slide48

ACM CoNEXT, Irvine, CA, December 2016

48Average FCT of Small Flows (<100KB)