Hierarchical Rings with Deflection Routing Rachata Ausavarungnirun Chris Fallin Xiangyao Yu Kevin Chang Greg Nazario Reetuparna Das Gabriel H Loh ID: 627124
Download Presentation The PPT/PDF document "Design and Evaluation of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Design and Evaluation ofHierarchical Rings with Deflection Routing
Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Chang, Greg Nazario, Reetuparna Das, Gabriel H. Loh, Onur MutluSlide2
Executive SummaryRings do not scale
well as core count increasesTraditional hierarchical ring designs are complex and energy inefficientComplicated buffering and flow controlSolution: Hierarchical Rings with Deflection (HiRD)Guarantees livelock freedom and deliveryEliminates all buffers at local routers and most buffers at bridge routersHiRD provides higher performance and energy-efficiency than hierarchical ringsHiRD
is simpler than hierarchical rings
2Slide3
OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees
Our Solution: HiRDResultsConclusion3Slide4
Scaling Problems in a Ring NoC
As the number of cores grows:Lower performanceMore power4Slide5
Alternative: Hierarchical Designs
Packets can reach far destination in fewer hops5
Local Ring (Level 0)
Global Ring (Level 1)Slide6
Single Ring vs. Hierarchical Rings
6
A hierarchical design provides
better performance as the network scalesSlide7
Complexity in Hierarchical Designs
7
Complex
buffering and flow controlSlide8
Single Ring vs. Hierarchical Rings8
Design complexity increases power consumptionSlide9
Our GoalDesign a hierarchical ring that has lower complexity
without sacrificing performance9Slide10
OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees
Our Solution: HiRDResultsConclusion10Slide11
Key IdeaEliminate buffersUse deflection routing
Simpler flow control11Slide12
Local Router
Key functionality:
Accept new flits
Pass flits around the ring
12
Core
Local East
Local WestSlide13
Eliminating Buffers in Local Routers13
Core
Local East
Local WestSlide14
Eliminating Buffers in Local RoutersFlits can enter the ring if the output is available
14Core
Local East
Local West
Ejector
No Buffer
Simpler CrossbarSlide15
Deflection Routing
15Core
Local East
Local West
Ejector
DeflectedSlide16
Bridge Router16
Local Ring
Global Ring
Crossbar
West
East
West
EastSlide17
Eliminating Buffers in Bridge Routers17
Crossbar
Local Ring
Global Ring
West
East
West
EastSlide18
Eliminating Buffers in Bridge Routers18
Local Ring
Global Ring
West
East
West
East
Simpler Buffering
Fewer Buffers
Simpler CrossbarSlide19
OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees
Our Solution: HiRDResultsConclusion19Slide20
Injection starvation
20
Src
Unable to inject
Starved Flit
Ring
Livelock
in Deflection RoutingSlide21
Ring
Ring
HiRD
: Injection Guarantee
Throttling provides
injection guarantee
21
After 150 cycles: All nodes stop injecting flits
Src
Unable to inject
Starved Flit
Throttled RouterSlide22
Livelock in Deflection Routing
Transfer starvation22
Transfer FIFO
Unable to Transfer
Starved Flit
RingSlide23
HiRD: Transfer Guarantee
Reservation provides transfer guarantee23After 10 looparounds
Transfer FIFO
Starved Flit
Ring
Reserved SlotSlide24
Ejection GuaranteeProvided by a prior workRe-transmit once [
Fallin et al., HPCA’11]Drop a flit if there is no available slotReserve a buffer slot at the destination if a flit was dropped24Slide25
End-to-end Delivery Guarantees25
LocalRing
Local
Ring
Global Ring
src
dest
Injection Guarantee
Transfer Guarantee
Injection Guarantee
Transfer Guarantee
Injection Guarantee
Ejection GuaranteeSlide26
OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees
Our Solution: HiRDResultsConclusion26Slide27
An Overview of HiRDDeflection routing
Simpler flow controlSimpler crossbars and control logicsNo buffers in the local ringsSimpler and faster local routersSimpler bridge routersLower power, less area and simpler to designProvides end-to-end delivery guaranteesInjection guarantee by throttlingTransfer guarantee by reservation27Slide28
Putting It All TogetherDeflection routing
Simpler flow controlSimpler crossbars and control logicNo buffers in the local ringsSimpler and faster local routersSimpler bridge routersLower power, less area and simpler to designProvides end-to-end delivery guaranteesInjection guarantee by throttlingTransfer guarantee by reservation28Slide29
OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees
Our Solution: HiRDResultsConclusion29Slide30
MethodologyCores16 and 64 OoO
CPU cores64 KB 4-way private L1Distributed L2Network1 flit local-to-global buffer4 flits global-to-local buffers2-cycle per hop latency for local routers3-cycle per hop latency for global routers60 workloads consisting of SPEC2006 apps30Slide31
Comparison to Previous DesignsSingle ring designKim and Kim, NoCArc’09
64-bit links128-bit links256-bit linksBuffered hierarchical ring designRavindran and Stumm, HPCA’97Identical topologyIdentical bisection bandwidth4-flit buffers in both local and global routers31Slide32
Results: System Performance32
2.9%1.9%
Hierarchical designs provide better performance than a single ring on a larger networkHiRD
performs better compared to buffered hierarchical rings due to lower latency in local routers and throttlingSlide33
Results: Network Power33
15%46.6%
Hierarchical designs consume much less power than the highest-performance single ring
2) Routers and flow control in
HiRD
are simpler than
routers in buffered hierarchical ringsSlide34
Router Area and Critical Path16-node network with 8 bridge routersVerilog RTL design using 45nm Technology
HiRD reduces NoC area by 50.3% compared to a buffered hierarchical ring designHiRD reduces local router critical path by 29.9% compared to a buffered hierarchical ring design34Slide35
Additional ResultsDetailed power breakdownSynthetic evaluationsEnergy efficiency results
Worst case analysisTechical Report:Multithreaded evaluationAverage, 90th percentile and max latencyComparison against other topologiesSensitivity analysis on different link bandwidths and number of buffers35Slide36
OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees
Our Solution: HiRDResultsConclusion36Slide37
ConclusionRings do not scale
well as core count increasesTraditional hierarchical ring designs are complex and energy inefficientComplicated buffering and flow controlSolution: Hierarchical Rings with Deflection (HiRD)Guarantees livelock freedom and deliveryEliminates all buffers at local routers and most buffers at bridge routersHiRD provides higher performance and energy-efficiency than hierarchical ringsHiRD
is simpler than hierarchical rings
37Slide38
Design and Evaluation ofHierarchical Rings with Deflection Routing
Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, Kevin Chang, Greg Nazario, Reetuparna Das, Gabriel H. Loh, Onur MutluSlide39
Backup Slides
39Slide40
Network Intensive Workloads
15 network intensive workloads40Slide41
System Performance41
5.6%2.5%
Deflections balance out the network load Thorttling reduces congestionSlide42
Network Power42
11.9%37%
More deflections happen when
the network is congestedSlide43
Detailed Results43Slide44
Multithreaded Applications44Slide45
Network Latency45Slide46
Synthetic Traffic Evaluations46Slide47
Topology Comparison
47Slide48
Sweep over Different Bandwidth48Slide49
Packet ReassemblyBorrowed from CHIPPER [Fallin
et al. HPCA’10]Retransmit-Once Destination node reserves a buffer slot for a dropped packetProvides ejection guarantee49Slide50
Other OptimizationsMap cores that communicate with each other a lot on the same local ringTakes advantage of the faster local ring routers
50Slide51
Related Concurrent WorksClumsy Flow Control [Kim et al., IEEE CAL’13]
Requires coordination between cores and memory controllersTransportation inspired NoCs [Kim et al., HPCA’14]tNoCs require an additional credit networktNoCs have more complex flow controlHiRD is more lightweight51Slide52
Some Related Previous WorksHierarchical Bus [
Udipi et al., HPCA’10]HiRD provides more scalabilityConcentrated Meshe [Das et al., HPCA’09]Several nodes share one routerUsed on meshed networkLess power efficient than HiRDLow-cost Mesh Router [J. Kim, MICRO’09]Specifically designed for meshesDoes not solve issues in deflection-based flow control (HiRD does)
52