Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California San Diego 2 Networks onChip Chipmultiprocessors CMPs increasingly popular 2Dmesh networks often used as onchip fabric ID: 492072
Download Presentation The PPT/PDF document "A Novel 3D Layer-Multiplexed On-Chip Net..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Novel 3D Layer-Multiplexed On-Chip Network
Rohit Sunkam Ramanujam
Bill Lin
Electrical and Computer Engineering
University of California, San DiegoSlide2
2
Networks
-on-Chip
Chip-multiprocessors (
CMPs) increasingly popular2D-mesh networks often used as on-chip fabric
I/O Area
I/O Area
single tile
1.5mm
2.0mm
21.72mm
12.64mm
Tilera
Tile64
Intel 80-coreSlide3
3D Integrated C
ircuits
Reduced chip footprint
Reduced wire delays
High inter-layer
bandwidth
Heterogeneous system integration
≥ 2 active device layers
Through Silicon Via
Device layer 1
Device layer 2
Short inter-layer distancesSlide4
Natural Progression:
3D Mesh for 3D
CMPs
What routing algorithms to use for 3D mesh networks?
2D Mesh
3D MeshSlide5
Outline
Oblivious routing on a 3D mesh
Layer-multiplexed 3D architecture
EvaluationSlide6
Oblivious Routing Objectives
Maximize
throughput
Distribute traffic evenly on network links
Maximize worst-case throughput as traffic is application dependentMinimize hop count
Minimize routing delay between source and destinationReduce powerSlide7
Ideal routing algorithm
Minimal latency
Maximum worst-case throughput
Dimension Ordered Routing
Minimal latency
Poor worst-case throughput
Valiant Routing
Optimal worst-case throughput Poor latency
Routing Algorithms for 3D Mesh Networks
1
2
0.25
0.5
Average hop count
(normalized to minimal)
Worst-case throughput
(fraction of network capacity)
IDEAL
DOR
VAL
O1TURN Routing
Minimal latency
Poor worst-case throughput
O1TURNSlide8
Randomized Partially-Minimal Routing (RPM)
Source
Destination
Random
intermediate layer
XY
or
YX
routing on the intermediate layer
X
Y
Z
Phase-1Z
Source to the intermediate layer
Phase-2Z
Intermediate layer to the destinationSlide9
Main Idea
Load-balance uniformly across the vertical
layers
2 phases of vertical routing
Min XY/YX used on each layer Slide10
Routing Algorithms for 3D Mesh Networks
1
2
0.25
0.5
Average hop count
(normalized to minimal)
Worst-case throughput
(fraction of network capacity)
IDEAL
DOR
VAL
O1TURN
RPM
1.1
Randomized Partially Minimal Routing
Near-optimal worst-case throughput
Low latency Slide11
RPM has Near-optimal Worst-case Throughput
RPM is optimal for even radix, within 1/k
2
of optimal for odd radix.Slide12
Performance of RPM:Average-case ThroughputSlide13
Outline
Oblivious routing on a 3D mesh
Layer-multiplexed (LM) 3D architecture
Evaluation Slide14
Unique Features of 3D ICs
Inter-layer distances are very small
(~50
μm
)Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)Vertical interconnects implemented using Through-Silicon-
Vias (TSVs) have very low delay
50μm
1500μm
TSVSlide15
Unique Features of 3D ICs
Inter-layer distances are very small
(~50
μm
)Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)
Vertical wires using Through-Silicon-Vias (TSVs) have very low delayVertical bandwidth abundant as TSVs can be densely packed in 2D with small via pitch (~4
μm)
4
μm
4
μmSlide16
Unique Features of 3D ICs
Inter-layer distances are very small
(~50
μm
)Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)Vertical wires using Through-Silicon-
Vias (TSVs) have very low delayVertical wiring abundant as TSVs can be packed in 2D with small via pitch (~4
μm)
Number of device layers likely to remain small (4-5 layers) due to thermal and manufacturing issuesSlide17
RPM on a 3D Mesh
Source
Destination
Random
intermediate layer
XY
or
YX
routing on the intermediate layer
X
Y
Z
Phase-1Z
Source to the intermediate layer
Phase-2Z
Intermediate layer to the destination
*Slide18
Proposed Layer-Multiplexed Architecture
Source
Destination
Random
intermediate layer
XY
or
YX
routing on the intermediate layer
X
Y
Z
Phase-1Z
Source to the intermediate layer
Phase-2Z
Intermediate layer to the destination
P1
P2
P3
P4
P1
P2
P3
P4
*
RPM routing adapted to the LM architecture : RPM-LMSlide19
Power and Area Savings
5x5 crossbar
in LM vs.
7x7 crossbar
in 3D mesh
P1
P2
P3
P4
Packet injection
demultiplexer
P1
P2
P3
P4
Packet ejection multiplexer
Layer-Multiplexed Architecture
P1
P2
P3
P4
Conventional 3D Mesh
.
.
.
Decouple
vertical routing from horizontal routing
Restrict vertical routing to packet injection and packet ejectionSlide20
Single Hop Vertical Communication
Single hop vertical routing
more power efficient than
one-layer-per-hop routing
Leverages short inter-layer distances in 3D ICsBetter utilizes available vertical bandwidthSlide21
Packet Injection
Demultiplexer
P1
P4
P2
P3
To the injection port of the Layer 1 router
To the injection port of the Layer 4 router
Switch Arbitration
Credits in from the injection port of routers on layers 1-4
Route Selection/Load Balancing
VC Allocation
Flit Counters
.
.
.Slide22
Packet Ejection Multiplexer
L1-P4
L2-P4
L3-P4
L4-P4
Arbiter
P1
L1-P1
L2-P1
L3-P1
L4-P1
Arbiter
Packets from layer4
Packets from layer2
Packets from layer3
Credits out for L1-P4,
L2-P4, L3-P4 and L4-P4
Credits out for L1-P1,
L2-P1, L3-P1 and L4-P1
Packets from layer4
Packets from layer2
Packets from layer3
VCID
.
.
.
P2
P3
P4
Router on Layer 1Slide23
Outline
Oblivious routing on a 3D mesh
Layer-multiplexed 3D architecture
Evaluation
Power and Area
PerformanceSlide24
Power and Area Evaluation
Used Orion 2.0 models for router power and area estimation.
65nm process at 1V and 1GHz
Buffers
4VCs/port, 5flits/VC for routers5 flits/port for packet injection demultiplexer5 flits/port for each packet ejection multiplexerSlide25
Power Comparison
3D mesh
One 7-port router per tile
LM
One 5-port router per tileOne packet injection demultiplexer for every 4 tiles One packet ejection multiplexer per tileSlide26
Power Evaluation
27% power reductionSlide27
Area Evaluation
26.5% power reductionSlide28
Outline
Oblivious routing on a 3D mesh
Layer-multiplexed 3D architecture
Evaluation
Power and Area
PerformanceSlide29
RPM on a 3D mesh vs. RPM-LM
Worst-case throughput
RPM-LM achieves same (near-optimal) worst-case throughput as RPM
Average-case throughputSlide30
Flit-Level Simulation
Ideal throughput evaluation assumes
Ideal single-cycle router
Infinite buffers
No contention in switches, no flow control Flit-level simulationPopNet network simulator
5 stage router pipelineCredit-based flow control8 virtual channels, each 5 flits deepMulti-flit packets injected into the network (5 flits/packet)Slide31
Flit-Level Simulation (cont’d)
Network configurations simulated
4
x
4 x 4 mesh8 x
8 x 4 meshFour different traffic traces usedUniform traffic
Transpose traffic: (
x,y,z) → (y,z,x)Complement traffic: (x,y,z) → (k-x-1, k-y-1, k-z-1)Worst Case traffic pattern for DOR (DOR-WC): (x,y,z) → (k-z-1, k-y-1, k-x-1)Slide32
Uniform Traffic8x8x4 MeshSlide33
Transpose Traffic8x8x4 MeshSlide34
Worst-case Traffic for DOR8x8x4 MeshSlide35
Summary of Contributions
Proposed a 3D Layer-multiplexed architecture which is an optimization of a 3D mesh
Exploits the optimality of RPM together with the high vertical bandwidth enabled in 3D technology
LM architecture consumes
27% less power, occupies 26% less area than a 3D meshRPM-LM has
comparable (marginally better) performance to RPM on a 3D meshSlide36
Thank you!!