An enhanced model-based checkpointing protocol for preventing useless checkpoints

Jiang Wu, D. Manivannan

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Checkpointing and rollback recovery are widely used techniques to handle failures in distributed computing systems. If there is no coordination among processes during checkpointing, processes may take useless checkpoints. Useless checkpoints are checkpoints that cannot be part of any consistent global checkpoint. In this paper, we propose a Communication-Induced checkpointing algorithm that prevents useless checkpoints by directing processes to take forced checkpoints more efficiently whenever a communication pattern that may lead to a Z-Cycle (ZC) is observed. Existence of ZC among checkpoints is known to be necessary and sufficient for making a checkpoint useless. The basic idea behind our algorithm can be extended to existing model-based checkpointing algorithms to reduce the number of forced checkpoints. We also compare the performance of our algorithm with an existing well-known algorithm.

Original languageEnglish
Pages (from-to)383-406
Number of pages24
JournalInternational Journal of Parallel, Emergent and Distributed Systems
Volume24
Issue number5
DOIs
StatePublished - Oct 2009

Bibliographical note

Funding Information:
A preliminary version of this paper [16] has been presented in the 25th International Conference on Parallel and Distributed Computing and Networking. This material is based in part upon work supported by the US National science Foundation under Grant No. IIS-0414791. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Keywords

  • Checkpointing
  • Fault-tolerance
  • Rollback recovery
  • Useless checkpoints

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'An enhanced model-based checkpointing protocol for preventing useless checkpoints'. Together they form a unique fingerprint.

Cite this