PPT-Rebound: Scalable Checkpointing for Coherent Shared Memory

Author : celsa-spraggs | Published Date : 2017-06-05

Rishi Agarwal Pranav Garg and Josep Torrellas Department of Computer Science University of Illinois at UrbanaChampaign httpiacomacsuiucedu Checkpointing in SharedMemory

Presentation Embed Code

Download Presentation

Download Presentation The PPT/PDF document "Rebound: Scalable Checkpointing for Cohe..." is the property of its rightful owner. Permission is granted to download and print the materials on this website for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Rebound: Scalable Checkpointing for Coherent Shared Memory: Transcript


Rishi Agarwal Pranav Garg and Josep Torrellas Department of Computer Science University of Illinois at UrbanaChampaign httpiacomacsuiucedu Checkpointing in SharedMemory MPs HWbased schemes for small CMPs use Global checkpointing. &. Rollback Recovery. Chapter 13. Anh Huy Bui. Jason Wiggs. Hyun Seok Roh. 1. Introduction . Rollback recovery protocols. restore the system back to a consistent state after a failure. achieve fault tolerance by periodically saving the state of a process during the failure-free execution . Shared memory. Process A. Process B. Physical Memory. Virtual address space A. Virtual address space B. Share. Share. Share. Instruction memory. Data. Instruction memory. Data. Create a shared memory region. Performance considerations. (CUDA best practices) . NVIDIA CUDA C programming best practices guide. ACK: CUDA teaching center Stanford (. Hoberrock. and . Tarjan. ).. Outline. Host to device memory transfer. Gokarna Sharma. (A joint work with . Costas Busch. ). Louisiana State University. Agenda. Introduction and Motivation. Scheduling Bounds in Different Software Transactional Memory Implementations. Tightly-Coupled Shared Memory Systems. Larry Peterson. In collaboration with . Arizona. , Akamai. ,. . Internet2. , NSF. , North Carolina, . Open Networking Lab, Princeton. (and several pilot sites). S3. DropBox. GenBank. iPlant. Data Management Challenge. Yao Wang, Andrew . Ferraiuolo. , G. Edward . Suh. Feb 17. th. 2014. Executive Summary. Observation:. Modern computing systems are vulnerable to . timing channel attacks. Problem: . No hardware techniques exist to eliminate timing channels through . KC . Sivaramakrishnan. . . Lukasz . Ziarek. . Suresh . Jagannathan. Purdue University SUNY Buffalo Purdue University. Big Picture. 2. No cache coherence. Message passing buffers. Mahesh Balakrishnan. Microsoft Research / VMware Research. Collaborators:. Dahlia Malkhi, Ted . Wobber. , Vijayan Prabhakaran, Phil Bernstein, Ming Wu, Michael Wei, Dan Glick, John D. Davis, . Aviad. Michael . Moeng. Sangyeun. Cho. Rami. . Melhem. University of Pittsburgh. Background. Architects simulating more cores. Increasing . simulation . times. Cannot keep doing single-threaded . simulations if we want to see results in a reasonable time frame. Purdue University. West Lafayette, IN. Date: April 8, 2013. Reliable and Scalable Checkpointing Systems for . Distributed . Computing Environments. Final exam of. Distributed Computing Environments. Tanzima Islam (tislam@purdue.edu). Presented by Sarah Arnold. 1. Agenda. Goals. Fault Tolerance. Failure Recovery. System Overview. Coordinated Checkpointing . Communication-Induced Checkpointing. Logging. Conclusions. 2. Goals. To recover the system after any type of fault has been introduced to the system and to minimize the amount of computation lost. Hagersten. , . Landin. , and . Haridi. (1991). Presented by Patrick . Eibl. Outline. Basics of Cache-Only Memory Architectures. The Data Diffusion Machine (DDM). DDM Coherence Protocol. Examples of Replacement, Reading, Writing. Large scale computing systems. Scalability . issues. Low level and high level communication abstractions in scalable systems. Network interface . Common techniques for high performance communication. Lecture 19. April . 4. th. , . 2012. Bus-Based Shared Memory (. Con’t. ). Distributed Shared Memory. Prof John D. Kubiatowicz. http://www.cs.berkeley.edu/~kubitron/cs252. Recall: Natural Extensions of Memory System.

Download Document

Here is the link to download the presentation.
"Rebound: Scalable Checkpointing for Coherent Shared Memory"The content belongs to its owner. You may download and print it for personal use, without modification, and keep all copyright notices. By downloading, you agree to these terms.

Related Documents