A quasi-synchronous checkpointing algorithm that prevents contention for stable storage

D. Manivannan, Q. Jiang, Jianchang Yang, M. Singhal

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

Checkpointing and rollback recovery are established techniques for handling failures in distributed systems. Under synchronous checkpointing, each process involved in the distributed computation takes checkpoint almost simultaneously. This causes contention for network stable storage and hence degrades performance as processes may have to wait for long time for the checkpointing operation to complete. In this paper, we propose a staggered quasi-synchronous checkpointing algorithm which reduces contention for network stable storage without any synchronization overhead.

Original languageEnglish
Pages (from-to)3110-3117
Number of pages8
JournalInformation Sciences
Volume178
Issue number15
DOIs
StatePublished - Aug 1 2008

Bibliographical note

Funding Information:
The authors thank the editors and reviewers for their valuable and constructive comments which helped greatly in improving the content and presentation of the paper. This material is based in part upon work supported by the US National science Foundation under Grant No. IIS-0414791 and the US Department of Treasury Award #T0505060. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or the Department of Treasury.

Keywords

  • Checkpoint staggering
  • Communication-induced checkpointing
  • Distributed checkpointing
  • Failure-recovery
  • Fault-tolerance
  • Rollback recovery
  • Staggered checkpointing
  • Uncoordinated

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A quasi-synchronous checkpointing algorithm that prevents contention for stable storage'. Together they form a unique fingerprint.

Cite this