Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Low-overhead recovery technique using quasi-synchronous checkpointing

Producción científica: Paperrevisión exhaustiva

101 Citas (Scopus)

Resumen

In this paper, we propose a quasi-synchronous check-pointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-induced check-point coordination for the progression of the recovery line which helps bound rollback propagation during a recovery. Thus, it has the ease and low overhead of asynchronous checkpointing and the recovery time advantages of synchronous checkpointing. There is no extra message overhead involved during checkpointing and the additional checkpointing overhead is nominal. The algorithm ensures the existence of a recovery line consistent with the latest checkpoint of any process all the time. The recovery algorithm exploits this feature to restore the system to a state consistent with the latest checkpoint of a failed process. The recovery algorithm has no domino effect and a failed process needs only to rollback to its latest checkpoint and request the other processes to roll back to a consistent checkpoint. To avoid domino effect, it uses selective pessimistic message logging at the receiver end. The recovery is asynchronous for single process failure. Neither the recovery algorithm nor the checkpointing algorithm requires the channels to be FIFO. We do not use vector timestamps for determining dependency between checkpoints since vector timestamps generally result in high message overhead during failure-free operation.

Idioma originalEnglish
Páginas100-107
Número de páginas8
EstadoPublished - 1996
EventoProceedings of the 1996 16th International Conference on Distributed Computing Systems - Hong Kong, Hong Kong
Duración: may 27 1996may 30 1996

Conference

ConferenceProceedings of the 1996 16th International Conference on Distributed Computing Systems
CiudadHong Kong, Hong Kong
Período5/27/965/30/96

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Huella

Profundice en los temas de investigación de 'Low-overhead recovery technique using quasi-synchronous checkpointing'. En conjunto forman una huella única.

Citar esto