DESCRY: Reproducing system-level concurrency failures

Tingting Yu, Tarannum S. Zaman, Chao Wang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Concurrent systems may fail in the field due to various elusive faults such as race conditions. Reproducing such failures is hard because (1) concurrency failures at the system level often involve multiple processes or event handlers (e.g., software signals), which cannot be handled by existing tools for reproducing intra-process (thread-level) failures; (2) detailed field data, such as user input, file content and interleaving schedule, may not be available to developers; and (3) the debugging environment may differ from the deployed environment, which further complicates failure reproduction. To address these problems, we present DESCRY, the first fully automated tool for reproducing system-level concurrency failures based only on default log messages collected from the field. DESCRY uses a combination of static and dynamic analysis techniques, together with symbolic execution, to synthesize both the failure-inducing data input and the interleaving schedule, and leverages them to deterministically replay the failed execution using existing virtual platforms. We have evaluated DESCRY on 22 realworld multi-process Linux applications with a total of 236, 875 lines of code to demonstrate both its effectiveness and its efficiency in reproducing failures that no other tool can reproduce.

Original languageEnglish
Title of host publicationESEC/FSE 2017 - Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
EditorsAndrea Zisman, Eric Bodden, Wilhelm Schafer, Arie van Deursen
Pages694-704
Number of pages11
ISBN (Electronic)9781450351058
DOIs
StatePublished - Aug 21 2017
Event11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017 - Paderborn, Germany
Duration: Sep 4 2017Sep 8 2017

Publication series

NameProceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering
VolumePart F130154

Conference

Conference11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2017
Country/TerritoryGermany
CityPaderborn
Period9/4/179/8/17

Bibliographical note

Funding Information:
This work was supported in part by NSF grants CCF-1464032, CNS-1405697, and CCF-1722710.

Publisher Copyright:
© 2017 Association for Computing Machinery.

Keywords

  • Concurrency failures
  • Debugging
  • Failure reproduction
  • Multi-process applications

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'DESCRY: Reproducing system-level concurrency failures'. Together they form a unique fingerprint.

Cite this