Fault Tolerance in MPI PowerPoint Presentation
Minaashi Kalyanaraman . Pragya . Upreti. CSS 534 Parallel Programming. OVERVIEW. Fault Tolerance in MPI. Levels of survival in MPI. Approaches to . f. ault tolerance in MPI. Advantages & disadvantages of implementing fault tolerance in MPI. ID: 424067Embed code:
Download this presentation
DownloadNote - The PPT/PDF document "Fault Tolerance in MPI" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Presentations text content in Fault Tolerance in MPI
Fault Tolerance in MPI
CSS 534 Parallel ProgrammingSlide2
Fault Tolerance in MPI
Levels of survival in MPI
ault tolerance in MPI
Advantages & disadvantages of implementing fault tolerance in MPI
Extending MPI to HARNESS
Comparison MPI and FT-MPI
MPI is not fault tolerant! -Is that true?
It is a common misconception about MPI.MPI provides considerable flexibility in the handling of errors. FAULT TOLERANCE IS THE PROPERTY OF AN MPI PROGRAM!
Processes in Job1
Processes in Job1
By default other processes detect error
Process P2 diesSlide4
Levels of Survival of an MPI Implementation
Approaches to achieve fault tolerance in MPI
1 – MPI implementation automatically recovers from failure and continues without significant change to its behavior. Highest Level of Survival and difficult to implement.
Level 2 – The MPI implementation is notified of the problem and is prepared to take corrective action.Example: Using Intercommunicators
Level 3 – In case of failure, certain MPI operations, although not all become invalid.Example: Modifying MPI Semantics, Extending MPI
Level 4 – In case of failure, the MPI program can abort and be restarted from a checkpoint.Example: Checkpointing
Program state of the failed process is retained for the overall computation to proceedSlide5
The MPI Standard and Fault Tolerance
Reliable Communication:The MPI implementation is responsible for detecting and handling network faults.The MPI implementation can retransmit the message or inform the application that an error has occurred, allowing the application to take its own corrective action.Error Handlers:Error handlers are set on communicators with MPI_Comm_set_errhandler.The default is MPI_ERRORS_ARE_FATAL and it can be changed to MPI_ERRORS_RETURN.Users can define their own error handlers and attach them to communicators.Slide6
ERROR HANDLING- CONTINUED
In c++ , MPI :: ERRORS_THROW_EXCEPTIONS is defined to handle the errorsIf an error is returned, the standard does not require that subsequent operations succeed or that they fail.Thus the standard allows implementations to take various approaches to the fault tolerance issueSlide7
Approach to Fault Tolerance in MPI programs
This is a common technique that periodically saves the state of a computation, allowing the computation to be restarted from that point in the event of a
is determined by,
to create and write
to read and restore
time to run without
Advantage & disadvantage:
It is easy to implement.
of saving and restoring checkpoints must be relatively small.
2.Using Intercommunicators:It contains two groups of processes.All communications occurs between processes in one group and processes in the other group.Example:Manager-Worker Manager process keeps track of a pool of tasks and dispatches them to working processes for completion.Workers return results to the manager, simultaneously requesting a new task.Advantages & Disadvantage:The manager can easily recognize that a particular worker has failed and communicate to other processes.Each group can keep tack of the state held by the other group.Difficult to implement in complex systems.
to Fault Tolerance in MPI programsSlide9
3.Modifying MPI Semantics: Takes advantage of the existing MPI objects that contain more state and MPI functions defined in the standard.Example:MPI objects guarantees that the number of processes and its rank in a communicator is constant.This property can be used by the program,To decompose data according to a communicator’s size.Calculate the data assigned to a process using its rank.Advantage & Disadvantage:Fault tolerant programs can be written for a wider set of algorithms.This approach uses the already existing semantics and therefore provides lesser fault tolerant features compared to other approaches.
to Fault Tolerance in MPI programsSlide10
4.Extending MPI:This approach is developed to address the difficulty of using MPI communicators when processes may fail.It is difficult to construct the communicator consisting of the two individual processes. If the Manager group has failed, then it is even more difficult because of collective semantics of communicator construction in MPI.
Approach to Fault Tolerance in MPI programsSlide11
Advantages of using MPI fault tolerance features
It is simple and easy to use the existing error handling features in
Users can extend the “MPI_ERRORS_RETURN” to define errors specific to their needs.
Error handling is purely local. Every process can have a different handler.
The ability to attach error handlers on a communicator increases the modularity of
MPI provides the ability to define one’s own application-specific error handler which is an important approach to fault
Limitations of Fault Tolerance in MPI
The specification makes no demands on MPI to survive
defined MPI error classes are used only to clarify to the user
source of the
is difficult for MPI to notify users of the failure of a given function that happen after the function has already returned
is no description of when error notification will happen relative to the occurrence of the error
is not possible for one application process to ask to be informed of errors on other processes or for the application to be informed of specific classes of errors.Slide13
Harness/ Fault Tolerant MPI: an Extension to MPI
(Heterogeneous Adaptive Reconfigurable Networked
System which provides highly dynamic, fault-tolerant computing environment for high performance computing applications
HARNESS is a joint DOE funded project involving Oak Ridge National Laboratory (ORNL), University of Tennessee at Knoxville
) and Emory University in Atlanta, GA
Harness : an extension to MPI
Current MPI implementations either abort or use check-pointing
Communications only via communicator
MPI communicator based on static modelSlide15
MPI (HARNESS) extends MPI
Allows applications to decide when errors occurs
Restart failed node
Continue with less number of nodes
When member communicator fails:
Communicator state changes to indicate problem
Message transfer continues if safe or be stopped or ignored
User application can fix or abort communicator to continueSlide16
Comparison FT-MPI and MPI: Communicator And Process States
Implementation: Extending MPI
When running an FT-MPI application, there are two parameters used to specify modes in which application is running.The first parameter, the ’communicator mode’, indicates what is the status of an MPI object after recovery. Which can be specified when starting the application:
Like MPI FTMPI can abort error
Failed process are not replace
Failed process respawned surviving process has same rank. Default mode
Failed Process not replaced.
No gaps in lists of processorsSlide18
Second parameter communication mode:Two types of communications
All operations which returned MPI_SUCCESS code will finish properly.
All ongoing messeages dropped.
Error on application sents it to last consistent state.Slide19
FT/MPI : Communicator (COMM.) Failure Handling
COMM. Invalidated if failure detected
Underlying system sends a state update to all processes for that COMM.
System behavior depends on COMM. mode chosen
All COMM. are not updated for
In form of error checkSome corrective action like communicator rebuild For example*: (Simple FT-MPI send usage) rc = MPI_Send(-------, com); if (rc == MPI_ERR_OTHER) MPI_Comm_dup (com, newcom); com = newcom; SPMD master-worker node only need master code to check for errors if user only takes master code as the point of failureSlide21
Example : MPI Error handlingSlide22
Example of Error Handling Using FT-MPISlide23
free overhead of P2P communication in MPI/FT is negligible in long running applications.
increases communication overhead considerably therefore user must determine less frequency of checkpoints.Slide24
FT-MPI is tool to provide with methods of dealing with failures within MPI applications
FT-MPI is useful for experimenting with
tuning collective communications
Distributed control algorithms
Dynamics libraries download
Developing further implementations that support more restrictive environments (
. embedded clusters)
Creation of number of drop-in library templates to simplify the construction of fault tolerant applications
High performance and survivabilitySlide26
Tolerance in MPI Programs:
3.0 Fault Tolerance Working Group:
London and Jack J.
"Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems" manual
"HARNESS and fault tolerant MPI" Parallel Computing 27, 2001 1479-1495
, Jack J.
"BUILDING AND USING A FAULT–TOLERANT MPI IMPLEMENTATION" The International Journal of High Performance Computing Applications, Volume 18, No. 3, Fall 2004, pp. 353–361
proceedings FT-MPI Presentation Graham E.
, Jack J.
Q & A ?Slide28
1.MPI vs TCP socket:
, one of the biggest weaknesses of MPI is its lack of resilience — most (if not all) MPI implementations will kill an entire MPI job if any individual process dies. This is in contrast to the reliability of TCP sockets, for example: if a process on one side of a socket suddenly goes away, the peer just gets a stale socket
2. Does MPI guarantee that user-defines handler be used as MPI_ERRORS_RETURN
specification does not state whether an error that would cause MPI functions to return an error code under the MPI_ERRORS_RETURN error handler would cause a user-defined error handler to be called during the same MPI function or at some earlier or later point in time.
3. Relation between
is related to performance of parallel I/O as checkpoint data is saved to a parallel file system.Slide29
Usability of HARNESS FT-MPI
The fault tolerance feature provides by HARNESS depends on its implementation. The HARNESS team actually works on the reported bugs and releases new versions.
Data Recovery in MPI
The MPI standard does not provide a way to recover data. It depends on the implementation of the MPI program.
Is fault tolerance in MPI can be made transparent?
It is very difficult to make the fault tolerance in MPI transparent. This is because of the complexity involved in communication between processes.Slide30
Derived datatype handling
Reduces memory copies while allowing overlapping 3 stages of data handling
Handling of compacted Datatype:
only MPI_Snd and receive wer usedSlide35
Tests show compacted data handling gives 10% to 19% imrovement.Benefit of buffer reuse and reordering of data elements leads to considerable improvements on heterogeneous networks.Slide36Slide37Slide38Slide39Slide40Slide41