/
A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chip A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chip

A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chip - PowerPoint Presentation

thomas
thomas . @thomas
Follow
66 views
Uploaded On 2023-09-23

A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chip - PPT Presentation

Mohammad Fattah 1 Antti Airola 1 Rachata Ausavarungnirun 2 Nima Mirzaei 3 Pasi Liljeberg 1 Juha Plosila 1 Siamak Mohammadi 3 Tapio Pahikkala ID: 1019903

reconf routing high distributed routing reconf distributed high dst reconfiguration area fully mode maze central output fault router line

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Low-Overhead, Fully-Distributed, Guara..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-ChipsMohammad Fattah1, Antti Airola1, Rachata Ausavarungnirun2, Nima Mirzaei3,Pasi Liljeberg1, Juha Plosila1, Siamak Mohammadi3, Tapio Pahikkala1,Onur Mutlu2 and Hannu Tenhunen1

2. What is This Talk About?2Overtime, routers and links can become faulty.Dynamically find alternative paths.Previous works have at least one of the following limitations:Cover only few number of faultsUse a central controllerHigh area overheadHigh reconfiguration overhead upon new faultsMaze-Routing overcomes all the above limitations:Full-coverage: formally provenFully-distributed: using autonomous and standalone routersLow area overhead: using an algorithmic approach (16X less area compared to routing tables)Low reconfiguration overhead: by on the fly path exploration (Instantaneous operation on new failures)Better performance: 50% higher saturation throughput and, 28% lower latency on SPEC benchmarks compared to state-of-the-artSource: condenaststore.com Any # of faults Detect partitioning No central component No reconfiguration phase No routing table

3. Aggressive Transistor ScalingKey BenefitA Major Curse3Integrating many IPsProcessorsCache slicesMemory controllersSpecialized HWEtc.Reduced reliabilityFabrication time:DefectProcess variationRun-time:Negative bias temperature instability (NBTI)Hot carrier injection (HCI)Gate oxide breakdownElectro-MigrationOur designs must be:Fault-tolerant by construction!

4. IP vs. Network Faults4IPDegrades the performanceRest of the system can continueNetwork ElementsCripples the performanceSingle point of failure It is crucial to tolerateMany faults in links and routers!

5. Maze-RoutingFault-Tolerant by ConstructionFour Critical Goals5It is not:A router architecture, with fault tolerance patched to itRather, it isEssentially a routing algorithm, whichIs inherently fault-tolerantFull coverage (guaranteed delivery)Fully-distributed operationLow area footprintNo reconfiguration component/phaseMaze-Routing isThe first to provide all!XYXYXYXYXYXPriority ArbiterNorthEastSouthWestLocalNorthEastSouthWestLocalMazeMazeMazeMazeMaze

6. 6

7. 7

8. Goal 1: Full (Fault) CoverageLiteratureMaze-Routing8Limited number of faultsLimited fault patternLimited when disconnected nodesNo restriction onFault countFault patternDetect disconnected nodesAt router level

9. Goal 2: Fully Distributed OperationLiteratureMaze-Routing9Centralized methodsSingle point of failureTMR: ExpensiveDistributed methodsSynchronization points.Fault in Reconf. unit.No central componentNo reconfiguration unitEach router makesindividual decisionsFaults in algorithmonly disables theassociated linksReconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Reconf.Cent. SW/HW ControllerMazeMazeMazeMazeMazeMazeXPriority ArbiterNorthEastSouthWestLocalNorthEastSouthWestLocalMaze

10. Goal 3: Low Area OverheadLiteratureMaze-Routing10Routing tablesHigh area overhead5 read portsImplementation costPower dissipationVulnerability to run-time faultsOne failed bit: affects the whole routerArea ~ fault probability of routerAn algorithmic approachNo routing table

11. Goal 4: Low Reconfiguration OverheadLiteratureMaze-Routing11New failure detected?Pause the networkReconfigure to an alternative solutionResume normal operationIssues?Severe degradation of performanceaggressive online testingFew works with fast reconfigurationNo reconfiguration phasePath to destination is dynamically calculated per packetCalled on the fly reconfiguration

12. Maze-Routing: The First to Provide All12Coverage Reconfiguration O(Area) O(Reconf.)Zhang et al. [43] few fully distributed low on the flyLBDR [35] moderate central low N/Ad2-LBDR [7] moderate central low N/AOSR-Lite [38] moderate central low moderateTOSR [5] moderate distributed high fastBLINC [25] moderate distributed high fastuLBDR [36] high central high N/AWachter et al. [39] high distributed high slowFick et al. [19] high distributed high slowFace routing [11] high fully distributed excessive on the flyFTDR-H [18] high fully distributed high fastuDIREC [32] full central high excessiveARIADNE [3] full distributed high slowMaze-routing full fully distributed low on the fly

13. 13

14. Preliminaries14Face: regions bounded by links and routers4 inner faces1 outer faceRight/Left hand rule: exit from first output in right/left side.: clockwise around inner faces: counterclockwise around inner facesOpposite direction around outer faces

15. Preliminaries (II)15Few additional fields in the headerMDbest : closest distance (MD) to dst that the packet has reached so farInitial: MDsrc, dstOnly decrementsMode: routing mode used for the packetValues: normal, traversal ( or ), unreachableInitial: normal2 more fields to detect disconnected nodes

16. Maze-Routing16Normal mode:Is there any productive output?Take it and dec(MDbest)No? we should enter traversal mode:Draw line(cur, dst) between current node and dst? Take the first output in the left of line(cur, dst)? Take the first output in the right of line(cur, dst)Set the mode (either  or ), accordinglyTraversal mode:If MDcur, dst = MDbest with productive output?Return to (and act as in) normal modeOtherwise, follow the hand rule54324321314321dstsrc5│N4│N3│N3│2│N1│N0│NMaze-Routing definitely reaches dst,if a path to dst exists.We provide the formal proof in the paper.

17. Detecting Disconnected Nodes17Traversal mode:If MDcur, dst = MDbest with productive output?Return to normal modeNo?Follow the hand ruleThe destination is unreachable if:In traversal mode, we meet the same nodeas the one we entered the traversal modeThe hand rule picks the same output aswhen we entered the traversal mode323421231122123dstsrc3│N1│N1│More implementation details are available in the paper

18. 18

19. Simulation Methodology19NOCulator[1]8x8 mesh for performance analysisSynthetic traffic for performance evaluationSPEC CPU2006 benchmarks are also evaluatedMaze-Routing[2] implanted in minBD[3] routersDeflection-based: deadlock freedomGolden and sliver flits: router-level livelock freedomRetransmit-once: protocol-level deadlock freedom[1] NOCulator: https://github.com/CMU-SAFARI/NOCulator[2] Maze-Routing: https://github.com/CMU-SAFARI/NOCulator/tree/Maze-routing[3] MinBD: Fallin, Chris, et al. "MinBD: Minimally-buffered deflection routing for energy-efficient interconnect." NoCS 2012.

20. Configurations20Maze-Routing16 buffer spaces per (minBD) routerBase-line routerWormhole buffered routers1 VC per port40 buffer spaces per routerFaults:Links disabled randomlyFrom 1 to 5 link failures

21. Workloads21Synthetic trafficUniform random traffic with variant injection ratesSPEC CPU2006 benchmarksGrouped based on L1 misses per kilo instruction (MPKI)3 groups: High (>50), Low (<5), and Medium (rest) intensity4 mixes: L (all Low), ML (Medium/Low), M (all Medium), and H (all High).

22. Area Overhead22STMicro 60nm technology nodeMaze-routing:5 copies of alg., 1 per portARIADNE:Smallest tableReconfiguration logic is not implemented5 read portsLBDRe:Logic-based methodCentral approachLimited coverage3.8 x2.1 x27%15.9 x

23. Throughput: Uniform Random Traffic 1 disabled link5 disabled links2350%Sub-optimal pathsProvided path divergence

24. Throughput: SPEC CPU24workloadmix Up*/Down* Maze-routing 5 failures no failure 5 failures no failureL16.716.417.816.4ML18.818.218.917.2M27.725.721.619.2H54.450.525.823.1AVG29.427.72119Average packet latency30% latency reduction in average case

25. Reconfiguration Overhead250.2 flits/node/cycle40K Cycles66K CyclesMaze-Routing hasno reconfiguration phase

26. Summary26A practical fault-tolerant routing algorithm mustProvide full coverage with guaranteed deliveryOperate in fully-distributed mannerImpose low area overheadHave low reconfiguration overheadMaze-Routing is the first work to meet all the above goalsNOCulator and Maze-Routing are available on GitHubhttps://github.com/CMU-SAFARI/NOCulatorhttps://github.com/CMU-SAFARI/NOCulator/tree/Maze-routing

27. A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-ChipsMohammad Fattah1, Antti Airola1, Rachata Ausavarungnirun2, Nima Mirzaei3,Pasi Liljeberg1, Juha Plosila1, Siamak Mohammadi3, Tapio Pahikkala1,Onur Mutlu2 and Hannu Tenhunen1

28. Backup slides

29. Area Overhead29Header fields can be coded in 14/17 bits in 8x8/16x16 meshes.Assuming a baseline router with 144-bit channel width, we need to widen the channel by 10%/12%.Results in almost 20%/25% increase in the router area.

30. Deflection Implications30When a packet is deflectedHeader values are not valid anymoreWe need to reset the header values:Mode  NormalMDbest  MD (next router, dst)54324321314321dstsrc3│

31. Delivery Proof31Property: Given there is a path between src and dst, starting from src, by traversing the face underlying line(src,dst), the packet will definitely intersect the line at some point (p) other than srcThe MD(p,dst) is definitely smaller than MD(src,dst).In traversal mode: If MDcur, dst = MDbest with productive output?Return to (and act as in) normal mode we definitely exit to normal mode54324321314321dstsrc

32. A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-ChipsMohammad Fattah1, Antti Airola1, Rachata Ausavarungnirun2, Nima Mirzaei3,Pasi Liljeberg1, Juha Plosila1, Siamak Mohammadi3, Tapio Pahikkala1,Onur Mutlu2 and Hannu Tenhunen1