George Michelogiannakis Nan Jiang Daniel Becker William J Dally This work was completed in Stanford University HPC and datacenter networks increasingly oversubscribed Exascale for HPC may need 1 billionway parallelism ID: 358292
Download Presentation The PPT/PDF document "Channel Reservation Protocol for Over-Su..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Channel Reservation Protocol for Over-Subscribed Channels and Destinations
George Michelogiannakis,Nan Jiang, Daniel Becker, William J. DallyThis work was completed in Stanford UniversitySlide2
HPC and datacenter networks increasingly oversubscribed
Exascale for HPC may need 1 billion-way parallelismDatacenter server count annual growth 7-17%Levels of expensive bandwidth:Between servers (intra-rack)Between racks (intra-cluster)Between clusters (intra-datacenter)Between buildings (metro)
Between regions (
longhaul)
Introduction
Facebook’s datacenter network architecture. OSI 2013
Why optical data communications and why now
? Applied Physics. 2009Slide3
To make it worse, many traffic patterns create unbalanced loadUnbalanced load creates long paths of blocked packets (known as tree saturation
)I’ll present a channel reservation protocol which prevents network and endpoint congestionWe focus on lossless flow controlTree saturation is a major drawback
IntroductionSlide4
Motivation and related workChannel reservation protocol
EvaluationAgendaSlide5
Oversubscription and Hotspots
H
Cluster 1
Cluster 2
Oversubscribed
channels
Oversubscribed
Tree saturation root. Affects benign traffic
This setting represents over-subscribed links between
network clusters, or even between racksSlide6
Adversarial pattern tops at 5% flit injection
Benign pattern slightly higher (6-7%)Ideal flow control would avoid any interferenceImpact on Benign Traffic
Benign traffic is negatively affectedSlide7
Explicit Congestion Notification
Oversubscribed
channels
ECN detects congestion at the root of the congestion tree
Signals to the sources to throttle down
ECN: State of the art congestion handling schemeSlide8
Motivation and related workChannel reservation protocol
EvaluationAgendaSlide9
Potentially long p
acket sent speculativelyEncounters congestion. Converted to asingle-flit reservation request
Reply (ACK) creates reservations for the chosen
time slot in all oversubscribed resources
Channel Reservation Protocol
H
Cluster 1
Cluster 2
Oversubscribed
Oversubscribed
Resource available cycles 5 and 10
Destination
a
vailable cycles
10 and 15.
Result: cycle 10
Destination
r
eserves
cycle 10
Channel is
r
eserved for
c
ycle 10
Source is
i
nformed to
t
ransmit in cycle 10Slide10
CRP: Doodle for Packets
Challenge: Participant’s availabilities are distributed across the networkSlide11
Reservation table is one line in the DoodleDoodle asks for the length of time slots
We call a time slot a cellCells have Cmax cyclesWe keep a counter per cell because packet sizes differ
Reservation Tables
Cell labels
A
B
C
DE
…VcellsCell values
51210
100010
…
50Slide12
Request packets carry a vector to record what time slots are available in the resources traversed so farThis is used to build up to the final result of the Doodle
Reservation Vectors
Cell labels
A
B
C
DE
…Vcells
Cell valuesTT
FFT
…FSlide13
Request size: 80 cycles
Request Traversing a Channel
Cell labels
A
B
CD
E…Vcells
Cell values512
101000
10…50
Cell labels
A
B
C
D
E
…
V
cells
Cell values
T
T
T
T
T
…
T
Cell labels
A
B
C
D
E…V
cellsCell valuesTTT
FF
…FSlide14
Request Arriving at Destination
Cell labels
A
B
C
DE
…Vcells
Cell values3040
100512
100…90
Cell labels
A
B
C
D
E
…
V
cells
Cell values
T
T
T
F
F
…
F
Cell labels
A
B
C
D
E…Vcells
Cell valuesFTT
FF…
FSlide15
CRP: Doodle for Packets
We have identified the common availability. Now we need to inform everybodySlide16
Destination Reserving Bandwidth
Cell labels
A
B
C
DE…
VcellsCell values
3040100
512100…
90
Original destination table:
Cell labels
A
B
C
D
E
…
V
cells
Cell values
30
0
60
512
100
…
90
Resulting destination table:
Subtracts reservation size (80 cycles) from the appropriate cells (time slots)Slide17
Reserves 80 cycles starting from the granted timestamp cell (time slot)
ACK Traversing the Channel
Cell labels
A
B
CD
E…Vcells
Cell values512
101000
10…50
Original reservation table:
Cell labels
A
B
C
D
E
…
V
cells
Cell values
512
0
30
0
10
…
50
Resulting reservation table:Slide18
If participants cannot agree on a time, we wait and then try againIf time slot no longer available, ACK is converted to a retry
If network uncongested, speculative packets succeed and no overhead for reservationProtocol ConsiderationsSlide19
Motivation and related workChannel reservation protocol
EvaluationAgendaSlide20
Two clusters of 144-node fat trees12x12 routers
Clusters connected with four channelsAll channels are 10Gb/sMessages 2KB, divided into eight packetsCRP applies to the messageMethodology
Oversubscribed
Oversubscribed
H
4Slide21
Uniform RandomSlide22
Uniform Random
By the time ECN reacts, the flow is done
ECN does not share congestion
s
tate with other destinations inthe same cluster
Oversubscribed
Oversubscribed
4
A
B
SSlide23
Combined Traffic
ECN can be configured to
p
revent tree saturation in
steady-state trafficSlide24
Combined Traffic
3.5% lower for CRP
CRP has extra
c
ontrol overheadSlide25
Transient Traffic
300,000 cycles to stabilize for ECN
ECN allows congestion occur and reacts to it. CRP prevents it entirelySlide26
Transient Traffic
300,000 cycles to stabilize for ECN
ECN’s maximum latency: 37,000 cycles
ECN allows congestion occur and reacts to it. CRP prevents it entirelySlide27
ECN Sensitivity: Three Clusters
ECN configuration is sensitive
t
o network topology, routing,and traffic patternSlide28
ECN Sensitivity: Four Clusters
ECN needs to be reconfiguredSlide29
CRP is a statistical scheme to avoid overwhelming channels and destinationsCRP effectively prevents congestion
Avoids pitfalls of ECN and reactive techniquesCRP focuses on lossless flow control but similar benefits are possible in lossy flow controlCongestion causes many packet drops
Conclusions