/
DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips

DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips - PowerPoint Presentation

mackenzie
mackenzie . @mackenzie
Follow
27 views
Uploaded On 2024-02-09

DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core Chips - PPT Presentation

Andrew DeOrio Konstantinos Aisopos Valeria Bertacco Li Shiuan Peh DAC 2011 University of Michigan Princeton University Massachusetts Institute of Technology ID: 1045931

mup drain recovery exampleup drain mup exampleup recovery disconnected performance data overhead state emergency node fault connected links resume

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "DRAIN: Distributed Recovery Architecture..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. DRAIN: Distributed Recovery Architecture for Inaccessible Nodes in Multi-core ChipsAndrew DeOrio†, Konstantinos Aisopos‡§Valeria Bertacco†, Li-Shiuan Peh§DAC 2011†University of Michigan‡Princeton University§Massachusetts Institute of Technology

2. Reliable Networks on Chip2Rup$processorcacherouterDetect if fault has occurredDiagnose what fault has occurredRecover and resume normal operationReconfigure network to account for faultDrainfault-tolerant routingdetectiondiagnosisrecon-figuratiorecoverynodes are disconnected, state is lost!

3. Recovery ApproachesCheckpoint/recovery approachesDrain takes a reactive approach, incurring performance overhead only when errors occur3RuP$checkpoint buffersdata stuck in checkpoint buffer!RuP$MEMhigh performance overhead!

4. Data Recovery with DrainRecover data lost during reconfigurationEmergency links provide alternate pathTransfers cache contents and architectural state primary linkMemuP$RouteruP$RouteruP$Router................................................processor corelocal cachememory controllerDRAIN emergency link4

5. Drain Exampleup$Mup$up$up$5

6. Drain Exampleup$Mup$up$up$Xlink failure6

7. Drain Exampleup$Mup$up$up$7reconfigure interconnect

8. Drain Exampleup$Mup$up$up$Xlink failure8

9. Drain Exampleup$Mup$up$up$9node isolated!

10. Drain Exampleup$Mup$up$up$drain connected nodes via primary links10

11. Drain Exampleup$Mup$up$up$drain disconnected node via emergency link11

12. Drain Exampleup$Mup$up$up$drain connected node again12

13. Drain Exampleup$Mup$up$resume normal operation13up$

14. Drain Performance as Links Fail14increasing emergency link timedecreasing functional network size

15. Memory Latency Before and After15

16. ConclusionsDRAIN is a lightweight recovery mechanism for CMPs5,000 gates per nodeRecoup cache data and architectural state from disconnected nodesPerformance overhead only during recovery~3ms at 1GHz16