/
Channel Reservation Protocol for Over-Subscribed Channels a Channel Reservation Protocol for Over-Subscribed Channels a

Channel Reservation Protocol for Over-Subscribed Channels a - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
379 views
Uploaded On 2016-06-11

Channel Reservation Protocol for Over-Subscribed Channels a - PPT Presentation

George Michelogiannakis Nan Jiang Daniel Becker William J Dally This work was completed in Stanford University HPC and datacenter networks increasingly oversubscribed Exascale for HPC may need 1 billionway parallelism ID: 358292

reservation cell oversubscribed ecn cell reservation ecn oversubscribed vcellscell traffic labelsabcde

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Channel Reservation Protocol for Over-Su..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Channel Reservation Protocol for Over-Subscribed Channels and Destinations

George Michelogiannakis,Nan Jiang, Daniel Becker, William J. DallyThis work was completed in Stanford UniversitySlide2

HPC and datacenter networks increasingly oversubscribed

Exascale for HPC may need 1 billion-way parallelismDatacenter server count annual growth 7-17%Levels of expensive bandwidth:Between servers (intra-rack)Between racks (intra-cluster)Between clusters (intra-datacenter)Between buildings (metro)

Between regions (

longhaul)

Introduction

Facebook’s datacenter network architecture. OSI 2013

Why optical data communications and why now

? Applied Physics. 2009Slide3

To make it worse, many traffic patterns create unbalanced loadUnbalanced load creates long paths of blocked packets (known as tree saturation

)I’ll present a channel reservation protocol which prevents network and endpoint congestionWe focus on lossless flow controlTree saturation is a major drawback

IntroductionSlide4

Motivation and related workChannel reservation protocol

EvaluationAgendaSlide5

Oversubscription and Hotspots

H

Cluster 1

Cluster 2

Oversubscribed

channels

Oversubscribed

Tree saturation root. Affects benign traffic

This setting represents over-subscribed links between

network clusters, or even between racksSlide6

Adversarial pattern tops at 5% flit injection

Benign pattern slightly higher (6-7%)Ideal flow control would avoid any interferenceImpact on Benign Traffic

Benign traffic is negatively affectedSlide7

Explicit Congestion Notification

Oversubscribed

channels

ECN detects congestion at the root of the congestion tree

Signals to the sources to throttle down

ECN: State of the art congestion handling schemeSlide8

Motivation and related workChannel reservation protocol

EvaluationAgendaSlide9

Potentially long p

acket sent speculativelyEncounters congestion. Converted to asingle-flit reservation request

Reply (ACK) creates reservations for the chosen

time slot in all oversubscribed resources

Channel Reservation Protocol

H

Cluster 1

Cluster 2

Oversubscribed

Oversubscribed

Resource available cycles 5 and 10

Destination

a

vailable cycles

10 and 15.

Result: cycle 10

Destination

r

eserves

cycle 10

Channel is

r

eserved for

c

ycle 10

Source is

i

nformed to

t

ransmit in cycle 10Slide10

CRP: Doodle for Packets

Challenge: Participant’s availabilities are distributed across the networkSlide11

Reservation table is one line in the DoodleDoodle asks for the length of time slots

We call a time slot a cellCells have Cmax cyclesWe keep a counter per cell because packet sizes differ

Reservation Tables

Cell labels

A

B

C

DE

…VcellsCell values

51210

100010

50Slide12

Request packets carry a vector to record what time slots are available in the resources traversed so farThis is used to build up to the final result of the Doodle

Reservation Vectors

Cell labels

A

B

C

DE

…Vcells

Cell valuesTT

FFT

…FSlide13

Request size: 80 cycles

Request Traversing a Channel

Cell labels

A

B

CD

E…Vcells

Cell values512

101000

10…50

Cell labels

A

B

C

D

E

V

cells

Cell values

T

T

T

T

T

T

Cell labels

A

B

C

D

E…V

cellsCell valuesTTT

FF

…FSlide14

Request Arriving at Destination

Cell labels

A

B

C

DE

…Vcells

Cell values3040

100512

100…90

Cell labels

A

B

C

D

E

V

cells

Cell values

T

T

T

F

F

F

Cell labels

A

B

C

D

E…Vcells

Cell valuesFTT

FF…

FSlide15

CRP: Doodle for Packets

We have identified the common availability. Now we need to inform everybodySlide16

Destination Reserving Bandwidth

Cell labels

A

B

C

DE…

VcellsCell values

3040100

512100…

90

Original destination table:

Cell labels

A

B

C

D

E

V

cells

Cell values

30

0

60

512

100

90

Resulting destination table:

Subtracts reservation size (80 cycles) from the appropriate cells (time slots)Slide17

Reserves 80 cycles starting from the granted timestamp cell (time slot)

ACK Traversing the Channel

Cell labels

A

B

CD

E…Vcells

Cell values512

101000

10…50

Original reservation table:

Cell labels

A

B

C

D

E

V

cells

Cell values

512

0

30

0

10

50

Resulting reservation table:Slide18

If participants cannot agree on a time, we wait and then try againIf time slot no longer available, ACK is converted to a retry

If network uncongested, speculative packets succeed and no overhead for reservationProtocol ConsiderationsSlide19

Motivation and related workChannel reservation protocol

EvaluationAgendaSlide20

Two clusters of 144-node fat trees12x12 routers

Clusters connected with four channelsAll channels are 10Gb/sMessages 2KB, divided into eight packetsCRP applies to the messageMethodology

Oversubscribed

Oversubscribed

H

4Slide21

Uniform RandomSlide22

Uniform Random

By the time ECN reacts, the flow is done

ECN does not share congestion

s

tate with other destinations inthe same cluster

Oversubscribed

Oversubscribed

4

A

B

SSlide23

Combined Traffic

ECN can be configured to

p

revent tree saturation in

steady-state trafficSlide24

Combined Traffic

3.5% lower for CRP

CRP has extra

c

ontrol overheadSlide25

Transient Traffic

300,000 cycles to stabilize for ECN

ECN allows congestion occur and reacts to it. CRP prevents it entirelySlide26

Transient Traffic

300,000 cycles to stabilize for ECN

ECN’s maximum latency: 37,000 cycles

ECN allows congestion occur and reacts to it. CRP prevents it entirelySlide27

ECN Sensitivity: Three Clusters

ECN configuration is sensitive

t

o network topology, routing,and traffic patternSlide28

ECN Sensitivity: Four Clusters

ECN needs to be reconfiguredSlide29

CRP is a statistical scheme to avoid overwhelming channels and destinationsCRP effectively prevents congestion

Avoids pitfalls of ECN and reactive techniquesCRP focuses on lossless flow control but similar benefits are possible in lossy flow controlCongestion causes many packet drops

Conclusions