An optimistic checkpointing and selective message logging approach for consistent global checkpoint collection in distributed systems

Qiangfeng Jiang, D. Manivannan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

In this paper, we present an asynchronous consistent global checkpoint collection algorithm which prevents contention for network storage at the file server and hence reduces the checkpointing overhead. The algorithm has two phases: In the first phase, a process initiates consistent global checkpoint collection by saving its state tentatively and asynchronously (called tentative checkpoint) in local memory or remote stable storage if there is no contention for stable storage while saving the state; in the second phase, the message log associated with the tentative checkpoint is stored in stable storage (checkpoint finalization phase). The tentative checkpoint together with the associated message log stored in the stable storage becomes part of a consistent global checkpoint. Under our algorithm, two or more processes can concurrently initiate consistent global checkpoint collection. Every tentative checkpoint will be finalized successfully unless a failure occurs. The finalized checkpoints of each process is assigned a unique sequence number in ascending order. Finalized checkpoints with same sequence number form a consistent global checkpoint.

Original languageEnglish
Title of host publicationProceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM
DOIs
StatePublished - 2007
Event21st International Parallel and Distributed Processing Symposium, IPDPS 2007 - Long Beach, CA, United States
Duration: Mar 26 2007Mar 30 2007

Publication series

NameProceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM

Conference

Conference21st International Parallel and Distributed Processing Symposium, IPDPS 2007
Country/TerritoryUnited States
CityLong Beach, CA
Period3/26/073/30/07

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • General Mathematics

Fingerprint

Dive into the research topics of 'An optimistic checkpointing and selective message logging approach for consistent global checkpoint collection in distributed systems'. Together they form a unique fingerprint.

Cite this