A communication-induced checkpointing and asynchronous recovery algorithm for multithreaded distributed systems

Tongchit Tantikul, D. Manivannan

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Checkpointing and recovery in traditional distributed systems is relatively well established. However, checkpointing and recovery in multithreaded distributed systems has not been studied in the literature. Using the traditional checkpointing and recovery algorithms in multithreaded systems leads to false causality problem and high checkpointing overhead. The checkpointing algorithm is implemented at the process level to reduce number of checkpoints and the recovery algorithm is implemented at the thread level which minimizes the false causality problem. The algorithm also takes advantage of the communication-induced checkpointing method to reduce the message overhead.

Original languageEnglish
Pages (from-to)284-292
Number of pages9
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3320
DOIs
StatePublished - 2004
Event5th International Conference, PDCAT 2004 - , Singapore
Duration: Dec 8 2004Dec 10 2004

Keywords

  • Asynchronous recovery
  • Communication-induced checkpointing
  • Distributed checkpointing
  • Fault-tolerance
  • Multithreaded distributed system

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science (all)

Fingerprint

Dive into the research topics of 'A communication-induced checkpointing and asynchronous recovery algorithm for multithreaded distributed systems'. Together they form a unique fingerprint.

Cite this