/
A Novel 3D Layer-Multiplexed On-Chip Network A Novel 3D Layer-Multiplexed On-Chip Network

A Novel 3D Layer-Multiplexed On-Chip Network - PowerPoint Presentation

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
418 views
Uploaded On 2016-11-23

A Novel 3D Layer-Multiplexed On-Chip Network - PPT Presentation

Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California San Diego 2 Networks onChip Chipmultiprocessors CMPs increasingly popular 2Dmesh networks often used as onchip fabric ID: 492072

routing layer case mesh layer routing mesh case throughput worst rpm power vertical intermediate packet area architecture evaluation traffic

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Novel 3D Layer-Multiplexed On-Chip Net..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A Novel 3D Layer-Multiplexed On-Chip Network

Rohit Sunkam Ramanujam

Bill Lin

Electrical and Computer Engineering

University of California, San DiegoSlide2

2

Networks

-on-Chip

Chip-multiprocessors (

CMPs) increasingly popular2D-mesh networks often used as on-chip fabric

I/O Area

I/O Area

single tile

1.5mm

2.0mm

21.72mm

12.64mm

Tilera

Tile64

Intel 80-coreSlide3

3D Integrated C

ircuits

Reduced chip footprint

Reduced wire delays

High inter-layer

bandwidth

Heterogeneous system integration

≥ 2 active device layers

Through Silicon Via

Device layer 1

Device layer 2

Short inter-layer distancesSlide4

Natural Progression:

3D Mesh for 3D

CMPs

What routing algorithms to use for 3D mesh networks?

2D Mesh

3D MeshSlide5

Outline

Oblivious routing on a 3D mesh

Layer-multiplexed 3D architecture

EvaluationSlide6

Oblivious Routing Objectives

Maximize

throughput

Distribute traffic evenly on network links

Maximize worst-case throughput as traffic is application dependentMinimize hop count

Minimize routing delay between source and destinationReduce powerSlide7

Ideal routing algorithm

Minimal latency

Maximum worst-case throughput

Dimension Ordered Routing

Minimal latency

Poor worst-case throughput

Valiant Routing

Optimal worst-case throughput Poor latency

Routing Algorithms for 3D Mesh Networks

1

2

0.25

0.5

Average hop count

(normalized to minimal)

Worst-case throughput

(fraction of network capacity)

IDEAL

DOR

VAL

O1TURN Routing

Minimal latency

Poor worst-case throughput

O1TURNSlide8

Randomized Partially-Minimal Routing (RPM)

Source

Destination

Random

intermediate layer

XY

or

YX

routing on the intermediate layer

X

Y

Z

Phase-1Z

Source to the intermediate layer

Phase-2Z

Intermediate layer to the destinationSlide9

Main Idea

Load-balance uniformly across the vertical

layers

2 phases of vertical routing

Min XY/YX used on each layer Slide10

Routing Algorithms for 3D Mesh Networks

1

2

0.25

0.5

Average hop count

(normalized to minimal)

Worst-case throughput

(fraction of network capacity)

IDEAL

DOR

VAL

O1TURN

RPM

1.1

Randomized Partially Minimal Routing

Near-optimal worst-case throughput

Low latency Slide11

RPM has Near-optimal Worst-case Throughput

RPM is optimal for even radix, within 1/k

2

of optimal for odd radix.Slide12

Performance of RPM:Average-case ThroughputSlide13

Outline

Oblivious routing on a 3D mesh

Layer-multiplexed (LM) 3D architecture

Evaluation Slide14

Unique Features of 3D ICs

Inter-layer distances are very small

(~50

μm

)Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)Vertical interconnects implemented using Through-Silicon-

Vias (TSVs) have very low delay

50μm

1500μm

TSVSlide15

Unique Features of 3D ICs

Inter-layer distances are very small

(~50

μm

)Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)

Vertical wires using Through-Silicon-Vias (TSVs) have very low delayVertical bandwidth abundant as TSVs can be densely packed in 2D with small via pitch (~4

μm)

4

μm

4

μmSlide16

Unique Features of 3D ICs

Inter-layer distances are very small

(~50

μm

)Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)Vertical wires using Through-Silicon-

Vias (TSVs) have very low delayVertical wiring abundant as TSVs can be packed in 2D with small via pitch (~4

μm)

Number of device layers likely to remain small (4-5 layers) due to thermal and manufacturing issuesSlide17

RPM on a 3D Mesh

Source

Destination

Random

intermediate layer

XY

or

YX

routing on the intermediate layer

X

Y

Z

Phase-1Z

Source to the intermediate layer

Phase-2Z

Intermediate layer to the destination

*Slide18

Proposed Layer-Multiplexed Architecture

Source

Destination

Random

intermediate layer

XY

or

YX

routing on the intermediate layer

X

Y

Z

Phase-1Z

Source to the intermediate layer

Phase-2Z

Intermediate layer to the destination

P1

P2

P3

P4

P1

P2

P3

P4

*

RPM routing adapted to the LM architecture : RPM-LMSlide19

Power and Area Savings

5x5 crossbar

in LM vs.

7x7 crossbar

in 3D mesh

P1

P2

P3

P4

Packet injection

demultiplexer

P1

P2

P3

P4

Packet ejection multiplexer

Layer-Multiplexed Architecture

P1

P2

P3

P4

Conventional 3D Mesh

.

.

.

Decouple

vertical routing from horizontal routing

Restrict vertical routing to packet injection and packet ejectionSlide20

Single Hop Vertical Communication

Single hop vertical routing

more power efficient than

one-layer-per-hop routing

Leverages short inter-layer distances in 3D ICsBetter utilizes available vertical bandwidthSlide21

Packet Injection

Demultiplexer

P1

P4

P2

P3

To the injection port of the Layer 1 router

To the injection port of the Layer 4 router

Switch Arbitration

Credits in from the injection port of routers on layers 1-4

Route Selection/Load Balancing

VC Allocation

Flit Counters

.

.

.Slide22

Packet Ejection Multiplexer

L1-P4

L2-P4

L3-P4

L4-P4

Arbiter

P1

L1-P1

L2-P1

L3-P1

L4-P1

Arbiter

Packets from layer4

Packets from layer2

Packets from layer3

Credits out for L1-P4,

L2-P4, L3-P4 and L4-P4

Credits out for L1-P1,

L2-P1, L3-P1 and L4-P1

Packets from layer4

Packets from layer2

Packets from layer3

VCID

.

.

.

P2

P3

P4

Router on Layer 1Slide23

Outline

Oblivious routing on a 3D mesh

Layer-multiplexed 3D architecture

Evaluation

Power and Area

PerformanceSlide24

Power and Area Evaluation

Used Orion 2.0 models for router power and area estimation.

65nm process at 1V and 1GHz

Buffers

4VCs/port, 5flits/VC for routers5 flits/port for packet injection demultiplexer5 flits/port for each packet ejection multiplexerSlide25

Power Comparison

3D mesh

One 7-port router per tile

LM

One 5-port router per tileOne packet injection demultiplexer for every 4 tiles One packet ejection multiplexer per tileSlide26

Power Evaluation

27% power reductionSlide27

Area Evaluation

26.5% power reductionSlide28

Outline

Oblivious routing on a 3D mesh

Layer-multiplexed 3D architecture

Evaluation

Power and Area

PerformanceSlide29

RPM on a 3D mesh vs. RPM-LM

Worst-case throughput

RPM-LM achieves same (near-optimal) worst-case throughput as RPM

Average-case throughputSlide30

Flit-Level Simulation

Ideal throughput evaluation assumes

Ideal single-cycle router

Infinite buffers

No contention in switches, no flow control Flit-level simulationPopNet network simulator

5 stage router pipelineCredit-based flow control8 virtual channels, each 5 flits deepMulti-flit packets injected into the network (5 flits/packet)Slide31

Flit-Level Simulation (cont’d)

Network configurations simulated

4

x

4 x 4 mesh8 x

8 x 4 meshFour different traffic traces usedUniform traffic

Transpose traffic: (

x,y,z) → (y,z,x)Complement traffic: (x,y,z) → (k-x-1, k-y-1, k-z-1)Worst Case traffic pattern for DOR (DOR-WC): (x,y,z) → (k-z-1, k-y-1, k-x-1)Slide32

Uniform Traffic8x8x4 MeshSlide33

Transpose Traffic8x8x4 MeshSlide34

Worst-case Traffic for DOR8x8x4 MeshSlide35

Summary of Contributions

Proposed a 3D Layer-multiplexed architecture which is an optimization of a 3D mesh

Exploits the optimality of RPM together with the high vertical bandwidth enabled in 3D technology

LM architecture consumes

27% less power, occupies 26% less area than a 3D meshRPM-LM has

comparable (marginally better) performance to RPM on a 3D meshSlide36

Thank you!!