TY - GEN
T1 - A communication-induced checkpointing and asynchronous recovery protocol for mobile computing systems
AU - Tantikul, Tongchit
AU - Manivannan, D.
PY - 2005
Y1 - 2005
N2 - Mobile computing systems have many constraints such as low battery power, low bandwidth, high mobility and lack of stable storage which are not presented in static distributed systems. In this paper, we propose an efficient communication-induced checkpointing protocol for mobile computing systems. We also propose an asynchronous recovery protocol based on the checkpointing protocol. Mobile support stations control major parts of the checkpointing and recovery such as storing and tracing the checkpoints, requesting rollback and logging messages, so that mobile hosts do not incur much overhead. The recovery algorithm has no domino effect and a failed process needs to roll back to its latest checkpoint and request only a subset of the processes to rollback to a consistent checkpoint. Our recovery protocol uses selective message logging at the mobile support station to handle the messages lost due to roll-back.
AB - Mobile computing systems have many constraints such as low battery power, low bandwidth, high mobility and lack of stable storage which are not presented in static distributed systems. In this paper, we propose an efficient communication-induced checkpointing protocol for mobile computing systems. We also propose an asynchronous recovery protocol based on the checkpointing protocol. Mobile support stations control major parts of the checkpointing and recovery such as storing and tracing the checkpoints, requesting rollback and logging messages, so that mobile hosts do not incur much overhead. The recovery algorithm has no domino effect and a failed process needs to roll back to its latest checkpoint and request only a subset of the processes to rollback to a consistent checkpoint. Our recovery protocol uses selective message logging at the mobile support station to handle the messages lost due to roll-back.
KW - Distributed checkpointing
KW - Failure recovery
KW - Fault-tolerance
KW - Mobile computing system
UR - http://www.scopus.com/inward/record.url?scp=33745162224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745162224&partnerID=8YFLogxK
U2 - 10.1109/PDCAT.2005.5
DO - 10.1109/PDCAT.2005.5
M3 - Conference contribution
AN - SCOPUS:33745162224
SN - 0769524052
SN - 9780769524054
T3 - Parallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings
SP - 70
EP - 74
BT - Proceedings - Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2005
T2 - 6th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2005
Y2 - 5 December 2005 through 8 December 2005
ER -