/
Design and Evaluation of Design and Evaluation of

Design and Evaluation of - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
369 views
Uploaded On 2018-02-01

Design and Evaluation of - PPT Presentation

Hierarchical Rings with Deflection Routing Rachata   Ausavarungnirun Chris  Fallin   Xiangyao  Yu  Kevin Chang Greg  Nazario   Reetuparna  Das Gabriel H  Loh   ID: 627124

local ring deflection hierarchical ring local hierarchical deflection guarantee buffers simpler east west flow delivery global rings network flit

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Design and Evaluation of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Design and Evaluation ofHierarchical Rings with Deflection Routing

Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​Kevin Chang, Greg Nazario, Reetuparna Das, Gabriel H. Loh, ​Onur Mutlu​​Slide2

Executive SummaryRings do not scale

well as core count increasesTraditional hierarchical ring designs are complex and energy inefficientComplicated buffering and flow controlSolution: Hierarchical Rings with Deflection (HiRD)Guarantees livelock freedom and deliveryEliminates all buffers at local routers and most buffers at bridge routersHiRD provides higher performance and energy-efficiency than hierarchical ringsHiRD

is simpler than hierarchical rings

2Slide3

OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees

Our Solution: HiRDResultsConclusion3Slide4

Scaling Problems in a Ring NoC

As the number of cores grows:Lower performanceMore power4Slide5

Alternative: Hierarchical Designs

Packets can reach far destination in fewer hops5

Local Ring (Level 0)

Global Ring (Level 1)Slide6

Single Ring vs. Hierarchical Rings

6

A hierarchical design provides

better performance as the network scalesSlide7

Complexity in Hierarchical Designs

7

Complex

buffering and flow controlSlide8

Single Ring vs. Hierarchical Rings8

Design complexity increases power consumptionSlide9

Our GoalDesign a hierarchical ring that has lower complexity

without sacrificing performance9Slide10

OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees

Our Solution: HiRDResultsConclusion10Slide11

Key IdeaEliminate buffersUse deflection routing

Simpler flow control11Slide12

Local Router

Key functionality:

Accept new flits

Pass flits around the ring

12

Core

Local East

Local WestSlide13

Eliminating Buffers in Local Routers13

Core

Local East

Local WestSlide14

Eliminating Buffers in Local RoutersFlits can enter the ring if the output is available

14Core

Local East

Local West

Ejector

No Buffer

Simpler CrossbarSlide15

Deflection Routing

15Core

Local East

Local West

Ejector

DeflectedSlide16

Bridge Router16

Local Ring

Global Ring

Crossbar

West

East

West

EastSlide17

Eliminating Buffers in Bridge Routers17

Crossbar

Local Ring

Global Ring

West

East

West

EastSlide18

Eliminating Buffers in Bridge Routers18

Local Ring

Global Ring

West

East

West

East

Simpler Buffering

Fewer Buffers

Simpler CrossbarSlide19

OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees

Our Solution: HiRDResultsConclusion19Slide20

Injection starvation

20

Src

Unable to inject

Starved Flit

Ring

Livelock

in Deflection RoutingSlide21

Ring

Ring

HiRD

: Injection Guarantee

Throttling provides

injection guarantee

21

After 150 cycles: All nodes stop injecting flits

Src

Unable to inject

Starved Flit

Throttled RouterSlide22

Livelock in Deflection Routing

Transfer starvation22

Transfer FIFO

Unable to Transfer

Starved Flit

RingSlide23

HiRD: Transfer Guarantee

Reservation provides transfer guarantee23After 10 looparounds

Transfer FIFO

Starved Flit

Ring

Reserved SlotSlide24

Ejection GuaranteeProvided by a prior workRe-transmit once [

Fallin et al., HPCA’11]Drop a flit if there is no available slotReserve a buffer slot at the destination if a flit was dropped24Slide25

End-to-end Delivery Guarantees25

LocalRing

Local

Ring

Global Ring

src

dest

Injection Guarantee

Transfer Guarantee

Injection Guarantee

Transfer Guarantee

Injection Guarantee

Ejection GuaranteeSlide26

OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees

Our Solution: HiRDResultsConclusion26Slide27

An Overview of HiRDDeflection routing

Simpler flow controlSimpler crossbars and control logicsNo buffers in the local ringsSimpler and faster local routersSimpler bridge routersLower power, less area and simpler to designProvides end-to-end delivery guaranteesInjection guarantee by throttlingTransfer guarantee by reservation27Slide28

Putting It All TogetherDeflection routing

Simpler flow controlSimpler crossbars and control logicNo buffers in the local ringsSimpler and faster local routersSimpler bridge routersLower power, less area and simpler to designProvides end-to-end delivery guaranteesInjection guarantee by throttlingTransfer guarantee by reservation28Slide29

OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees

Our Solution: HiRDResultsConclusion29Slide30

MethodologyCores16 and 64 OoO

CPU cores64 KB 4-way private L1Distributed L2Network1 flit local-to-global buffer4 flits global-to-local buffers2-cycle per hop latency for local routers3-cycle per hop latency for global routers60 workloads consisting of SPEC2006 apps30Slide31

Comparison to Previous DesignsSingle ring designKim and Kim, NoCArc’09

64-bit links128-bit links256-bit linksBuffered hierarchical ring designRavindran and Stumm, HPCA’97Identical topologyIdentical bisection bandwidth4-flit buffers in both local and global routers31Slide32

Results: System Performance32

2.9%1.9%

Hierarchical designs provide better performance than a single ring on a larger networkHiRD

performs better compared to buffered hierarchical rings due to lower latency in local routers and throttlingSlide33

Results: Network Power33

15%46.6%

Hierarchical designs consume much less power than the highest-performance single ring

2) Routers and flow control in

HiRD

are simpler than

routers in buffered hierarchical ringsSlide34

Router Area and Critical Path16-node network with 8 bridge routersVerilog RTL design using 45nm Technology

HiRD reduces NoC area by 50.3% compared to a buffered hierarchical ring designHiRD reduces local router critical path by 29.9% compared to a buffered hierarchical ring design34Slide35

Additional ResultsDetailed power breakdownSynthetic evaluationsEnergy efficiency results

Worst case analysisTechical Report:Multithreaded evaluationAverage, 90th percentile and max latencyComparison against other topologiesSensitivity analysis on different link bandwidths and number of buffers35Slide36

OutlineBackground and MotivationKey Idea: Deflection RoutingEnd-to-end Delivery Guarantees

Our Solution: HiRDResultsConclusion36Slide37

ConclusionRings do not scale

well as core count increasesTraditional hierarchical ring designs are complex and energy inefficientComplicated buffering and flow controlSolution: Hierarchical Rings with Deflection (HiRD)Guarantees livelock freedom and deliveryEliminates all buffers at local routers and most buffers at bridge routersHiRD provides higher performance and energy-efficiency than hierarchical ringsHiRD

is simpler than hierarchical rings

37Slide38

Design and Evaluation ofHierarchical Rings with Deflection Routing

Rachata Ausavarungnirun, Chris Fallin, Xiangyao Yu, ​Kevin Chang, Greg Nazario, Reetuparna Das, Gabriel H. Loh, ​Onur Mutlu​​Slide39

Backup Slides

39Slide40

Network Intensive Workloads

15 network intensive workloads40Slide41

System Performance41

5.6%2.5%

Deflections balance out the network load Thorttling reduces congestionSlide42

Network Power42

11.9%37%

More deflections happen when

the network is congestedSlide43

Detailed Results43Slide44

Multithreaded Applications44Slide45

Network Latency45Slide46

Synthetic Traffic Evaluations46Slide47

Topology Comparison

47Slide48

Sweep over Different Bandwidth48Slide49

Packet ReassemblyBorrowed from CHIPPER [Fallin

et al. HPCA’10]Retransmit-Once  Destination node reserves a buffer slot for a dropped packetProvides ejection guarantee49Slide50

Other OptimizationsMap cores that communicate with each other a lot on the same local ringTakes advantage of the faster local ring routers

50Slide51

Related Concurrent WorksClumsy Flow Control [Kim et al., IEEE CAL’13]

Requires coordination between cores and memory controllersTransportation inspired NoCs [Kim et al., HPCA’14]tNoCs require an additional credit networktNoCs have more complex flow controlHiRD is more lightweight51Slide52

Some Related Previous WorksHierarchical Bus [

Udipi et al., HPCA’10]HiRD provides more scalabilityConcentrated Meshe [Das et al., HPCA’09]Several nodes share one routerUsed on meshed networkLess power efficient than HiRDLow-cost Mesh Router [J. Kim, MICRO’09]Specifically designed for meshesDoes not solve issues in deflection-based flow control (HiRD does)

52