Abstract
In this paper, we analyze the performance of four communication-induced checkpointing algorithms. Our study shows that even though the performance of some of the communication-induced checkpointing algorithms are a suspect, in terms of scalability and checkpointing overhead, some have same performance as coordinated checkpointing algorithms but without any explicit synchronization overhead. Traditionally, pessimistic and optimistic message logging techniques are used to handle the various types of messages that arise during rollback recovery. Under these two message logging techniques, all the messages sent by all processes are logged either by the sender or receiver. However, our study shows that selective message logging together with a carefully designed communication-induced checkpointing algorithm can give good performance in terms of checkpointing overhead and message logging overhead.
Original language | English |
---|---|
Pages (from-to) | 129-136 |
Number of pages | 8 |
Journal | Computer Systems Science and Engineering |
Volume | 18 |
Issue number | 3 |
State | Published - May 2003 |
Keywords
- Communication-induced checkpointing
- Consistent global snapshot
- Distributed checkpointing
- Fault-tolerance
- Performance evaluation
- Quasi-synchronous checkpointing
ASJC Scopus subject areas
- Control and Systems Engineering
- Theoretical Computer Science
- General Computer Science