Concurrent systems may fail in the field due to various elusive faults such as race conditions. Reproducing such failures is hard because (1) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which cannot be handled by existing tools for reproducing intra-process (thread-level) failures; (2) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers; and (3) the debugging environment may differ from the deployed environment, which further complicates failure reproduction. To address these problems, we present DESCRY, the first fully automated tool for reproducing system-level concurrency failures based only on default log messages collected from the field. DESCRY uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. We have evaluated DESCRY on 22 realworld multi-process Linux applications with a total of 236, 875 lines of code to demonstrate both its effectiveness and its efficiency in reproducing failures that no other tool can reproduce.
|Title of host publication||ESEC/FSE 2017 - Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering|
|Editors||Andrea Zisman, Eric Bodden, Wilhelm Schafer, Arie van Deursen|
|Number of pages||11|
|State||Published - Aug 21 2017|
|Event||11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017 - Paderborn, Germany|
Duration: Sep 4 2017 → Sep 8 2017
|Name||Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering|
|Conference||11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017|
|Period||9/4/17 → 9/8/17|
Bibliographical noteFunding Information:
This work was supported in part by NSF grants CCF-1464032, CNS-1405697, and CCF-1722710.
© 2017 Association for Computing Machinery.
- Concurrency failures
- Failure reproduction
- Multi-process applications
ASJC Scopus subject areas