/
Destination-Based Adaptive Routing for 2D Mesh Destination-Based Adaptive Routing for 2D Mesh

Destination-Based Adaptive Routing for 2D Mesh - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
345 views
Uploaded On 2019-03-16

Destination-Based Adaptive Routing for 2D Mesh - PPT Presentation

Networks ANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California San Diego Networks onChip Chipmultiprocessors CMPs increasingly popular ID: 757105

delay congestion destination avg congestion delay avg destination based node routing adaptive local quadrant dar nodes split output window

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Destination-Based Adaptive Routing for 2..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Destination-Based Adaptive Routing for 2D Mesh NetworksANCS 2010

Rohit Sunkam Ramanujam

Bill Lin

Electrical and Computer Engineering

University of California, San DiegoSlide2

Networks-on-Chip

Chip-multiprocessors (

CMPs

) increasingly popular2D-mesh networks often used as on-chip fabricRouting algorithm central in determining performance

Tilera

Tile64

Intel 48-core data center on die(ISSCC 2010)Slide3

Classes of Routing AlgorithmsOblivious routing

Simple and fast router designs

Poor load balancing under

bursty trafficAdaptive routing

Better performance (throughput, latency) Better fault tolerance

Higher router complexitySlide4

Related WorkOblivious Routing [Valiant, ROMM, O1TURN, Optimal oblivious routing]

Optimize for

worst

and average-case performance Adaptive routing commercially used in multiprocessors from IBM, Cray, CompaqOn-chip routing very different from off-chip:

Lower powerLower area Lower router complexitySlide5

OutlineIntroduction

Motivation

Destination-Based Adaptive Routing (DAR

)EvaluationSlide6

Minimal Adaptive

R

outing

Model

Adaptive routing along minimal directions

D

SSlide7

Coarse Fine

Granularity of Congestion Estimation

Local congestionSlide8

Local CongestionLocal adaptive

Measure local congestion metric (free VC, free buffers)

S

Low congestion

Moderate congestion

D

High congestion

Optimal

Local adaptiveSlide9

Coarse Fine

Granularity of Congestion Estimation

Local congestion

Dimension-based congestionSlide10

Dimension-based Congestion

RCA-1D (

Gratz

et al. HPCA’ 08)

Exponential moving average of congestion to all nodes along a dimension

S

Low congestion

Moderate congestion

D

High congestion

Optimal

RCA-1DSlide11

Coarse Fine

Granularity of Congestion Estimation

Local congestion

Dimension-based congestion

Quadrant-based congestionSlide12

Quadrant-based Congestion

RCA-Quadrant (

Gratz

et al. HPCA’ 08)

Exponential moving average of congestion to all nodes in the destination quadrant

S

Low congestion

Moderate congestion

D

High congestion

OptimalSlide13

Quadrant-based Congestion

RCA-Quadrant (

Gratz

et al. HPCA’ 08)

Exponential moving average of congestion to all nodes in the destination quadrant

S

Low congestion

Moderate congestion

D

High congestion

OptimalSlide14

Quadrant-based Congestion

RCA-Quadrant (

Gratz

et al. HPCA’ 08)

Exponential moving average of congestion to all nodes in the destination quadrant

S

Low congestion

Moderate congestion

D

High congestion

Optimal

RCA-quadSlide15

Coarse Fine

Granularity of Congestion Estimation

Local congestion

Dimension-based congestion

Quadrant-based congestion

Destination-based congestionSlide16

Ideally …On a

per-destination

basis:

Estimate end-to-end delay along all minimal paths to destinationChoose path with least delay

S

Low congestion

Moderate congestion

D

High congestion

OptimalSlide17

ChallengesLimited bandwidth for congestion updates

Congestion notification not instantaneous

Limited storage in on-chip routers

Exponential number of paths to each destinationLimited hardware resources for computations

How can we practically emul

ate ideal adaptive routing? Slide18

Destination-based adaptive routing (DAR)

A node estimates delay to all other nodes through candidate outputs every T cycles

S

D

L[N][D] = 20

L[E][D] = 30Slide19

DAR-High LevelTraffic distribution to output ports controlled using per-destination split

ratios W

W[N][D]= 0.6

W[E][D]= 0.4

S

D

Estimate delay to destination through candidate outputs

Shift traffic from more congested port to less congested port

Start with initial set of split ratios

L[N][D] = 20

L[E][D] = 30Slide20

DAR-High LevelTraffic distribution to output ports controlled using per-destination split

ratios W

Estimate delay to destination through candidate outputs

S

D

Shift traffic from more congested port to less congested port

Start with initial set of split ratios

W[N][D]= 0.8

W[E][D]= 0.2

L[N][D] = 20

L[E][D] = 30Slide21

OutlineIntroduction

Motivation

Destination-Based Adaptive Routing (DAR

)

Distributed delay measurementSplit ratio adaptationScaling

EvaluationSlide22

Distributed Delay Measurement

A node maintains:

Per

-destination traffic split ratio through candidate output ports: W[p][j]Delay to next-hop router/ejection interface through each output port (N, S, E, W,

Ej): l[p

]Slide23

Distributed Delay Measurement

Every node estimates average delay to all other nodes in the network

12

13

14

15

8

4

0

9

5

11

6

7

1

2

3

10

Avg

10

[10]

Avg

10

[10]

Avg

10

[10]

Avg

10

[10]

Delay

from 10 to itself,

Avg

10

[10] = l

10

[Ej

]

Avg

10

[10] propagated to neighbors

Nodes 6

, 9, 14, 11 add local delay to

Avg

10

[

10] to compute delay to node 10

For example, at node 9, L

[E][10] =

l[E

] + Avg

10

[10

] Avg

9

[10] = L[E][10]Slide24

Distributed Delay Measurement

Every node estimates delay to all other nodes in the network

12

13

14

15

8

4

0

9

5

11

6

7

1

2

3

10

Avg

14

[10]

Avg

11

[10]

Avg

9

[10]

Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors

For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg

6

[10] A[N][10] = Avg

9

[10]

Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] +

l[E

] L[N][10] = A[N][10] +

l[N

]

Finally, average delay from node 5 to node 10 is computed as: Avg

5

[10] = W[E][10]L[E][10] + W[N][10]L[N][10]

Avg

14

[10]

Avg

9

[10]

Avg

9

[10]

Avg

6

[10]

Avg

6

[10]

Avg

6

[10]

Avg

11

[10]Slide25

Distributed Delay Measurement

Every node estimates delay to all other nodes in the network

12

13

14

15

8

4

0

9

5

11

6

7

1

2

3

10

Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors

For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg

6

[10] A[N][10] = Avg

9

[10]

Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] +

l[E

] L[N][10] = A[N][10] +

l[N

]

Finally, average delay from node 5 to node 10 is computed as: Avg

5

[10] = W[E][10]L[E][10] + W[N][10]L[N][10]Slide26

OutlineIntroduction

Motivation

Destination-Based Adaptive Routing (DAR

)

Distributed delay measurementSplit ratio adaptation

ScalingEvaluationSlide27

Adaptation of Split ratio

Objective

: Equalize delay on candidate output ports

If only one candidate output, split ratio is 1If two candidate outputs,Let p

h be the port with higher delay to destination j

Let pl be the port with lower delay to destination j

W[ph][j] + W[pl][j] = 1

Δ

traffic shifted from

p

h

to

p

l

every T cycles

Δ

proportional to

(

L[p

h

][j]-L[pl

][j])/L[ph][j]Slide28

Coarse Fine

Granularity of Congestion Estimation

Local congestion

Dimension-based congestion

Quadrant-based congestion

Destination-based congestion

Does

not

scale !!

Slide29

Coarse Fine

Granularity of Congestion Estimation

Local congestion

Dimension-based congestion

Quadrant-based congestion

Destination-based congestion

Scalable Destination-based congestionSlide30

OutlineIntroduction

Motivation

Destination-Based Adaptive Routing (DAR

)

Distributed delay measurement

Split ratio adaptationScalingEvaluationSlide31

Look-ahead Window

0

0

3

3

6

6

9

9

46

46

P

A

43

40

40

37

37

12

12

B

15

15

18

18

21

21

34

34

31

28

28

25

25

50

50

53

53

56

56

59

59

96

96

93

93

90

87

87

62

68

68

71

71

84

84

81

81

78

78

75

75

18

18

21

21

28

28

25

25

68

68

71

71

78

78

75

75

96

96

93

93

90

90

87

87

84

84

81

81

78

78

75

75

78

75

75

0

0

3

3

6

6

9

9

12

12

15

15

18

18

21

21

18

18

21

21

0

0

3

46

46

43

43

50

50

53

53

96

96

C

C

93

96

96

93

93

0

0

3

3

A

C

P

C

P

B

S

15

N

ode

S

maintains delay estimate for

MxM

window

centered at S

.

Any n

ode outside window

mapped

to

closest

node

within

window

A packet’s

look-ahead window shifts as

it

is

routed from source to

destinationSlide32

Window SizeDestination D

guaranteed to be within window when packet is

(M-1)/2

hops away from D

Intuition: Packet has (M-1)/2 hops to route around congestion hot spots7x7 look-ahead window in 16x16 mesh has comparable performance to DAR (equivalent to 31x31 look-ahead window) Slide33

OutlineIntroduction

Related work

Destination-Based Adaptive Routing (DAR)

EvaluationSlide34

Experimental setupCompare DAR with RCA-1D, RCA-quadrant, Local adaptive

SPLASH-2 benchmarks + synthetic traffic patterns (uniform, transpose, shuffle)

Cycle-accurate

NoC simulator models 3-stage router pipeline8 VC, 5 flit deep1 VC used as escape VC for deadlock preventionSlide35

Splash results – 7x7 mesh

41%Slide36

Splash results – 7x7 mesh

65%Slide37

Uniform traffic – 8x8 meshSlide38

Transpose traffic – 8x8 meshSlide39

Shuffle traffic – 8x8 meshSlide40

SDAR - 16x16 mesh, 7x7 window

Average latency over 100 permutation traffic

patterns at 18% injection load

Network saturation statistics at 18% injection loadSlide41

SummaryDestination-based Adaptive

R

outing (DAR) for 2D mesh networks

Scalable DAR (SDAR) uses look-ahead window and easily scales to large networksDAR outperforms existing adaptive and oblivious routingSDAR achieves comparable performance with significantly less overheadsSlide42

Thank you!!Slide43

Key implementation detailsSimple router implementation: low storage

,

low bandwidth

Synchronize delay updates to reuse delay computation and weight adaptation hardwareApproximate computations to simplify implementation Slide44

Router architecture – Kim et al DAC ‘05

Quadrant

Port

Pre-select

VC-1

VC Allocator

XB Allocator

.

.

.

N

VC-

v

.

.

.

S

E

W

VC-1

.

.

.

VC-

v

Preferred Output Registers

In

N

S

E

W

Ej

Congestion Value Registers

Credits

Routing Unit

Override

CreditsSlide45

DAR Router

W

λ

L[p

y

][N-1]

p[N-1]

p[1]

p[0]

Destination

Port

Pre-select

VC-1

W[p

x

, p

y

][0]

W[p

x

, p

y

][1]

W[p

x

, p

y

][N-1]

Adapt

Weights

Latency

measurement

VC Allocator

XB Allocator

cnt[P-1]

cnt[0]

.

.

.

Increment/

Decrement

.

.

.

.

.

.

A[p

x

][0]

A[p

y

][0]

A[p

x

][N-1]

A[p

y

][N-1]

.

.

.

L[p

x

][0]

L[p

y

][0]

L[p

x

][N-1]

.

.

.

.

.

.

.

.

.

Latency

Propagation

.

.

.

Avg[0]

Avg[N-1]

.

.

.

Storage Overhead

Logic Overhead

N

VC-

v

.

.

.

S

E

VC-1

.

.

.

VC-

v

Preferred output registers

Per-destination Split ratios

Local

delay

In

N

S

E

W

Ej

l[P-1]

l[1]

l[0]

.

.

.

Exponentially averaged

local delay

cnt[1]Slide46

Distributed delay measurementA node maintains:Per-destination traffic split ratio through candidate output ports:

W[p][j

]

Delay to next-hop router/ejection interface through each output port (N, S, E, W, Ej): l[p

]Using updates received from downstream nodes, a node computes:L[p][j

]: Average delay from current node to node j through output port p

Avg[j]: Average delay from current node to node jSlide47

Destination-based Adaptive Routing (DAR)

Every router maintains per-destination split ratios which control traffic distribution to output ports

Split ratios adjusted every T cycles based on measured delay to D through the two ports

S

Low congestion

Moderate congestion

D

High congestion

0.8

0.2

0.7

0.3

1

1