CRII: SHF: SimDB: An Automated Framework to Debug System-level Concurrency Faults

  • Yu, Tingting (PI)

Grants and Contracts Details


Modern computer systems ranging from cyber-physical systems to smart mobile devices to large server systems are highly concurrent, memory intensive, and sensor intensive. These systems utilize multiple CPUs, connect to a large array of peripheral devices, and sense their surroundings through various sensors and actuators. The increasing complexity of these systems can make software on top of them suffer various forms of concurrency faults that can lead to production-run failures. Unfortunately, the unique non-deterministic nature of concurrency across the entire system makes software debugging process extremely difficult. A survey from Microsoft has showed that 70% of developers considered debugging concurrent software to be very hard. Over the past decade, researchers have developed approaches for debugging multi-threading programs. These approaches tend to reproduce concurrency faults based on passing and failing runs using existing failure-triggering inputs. However, there are still several challenges that make these techniques unlikely to deal with various classes of system-level concurrency faults. First, existing approaches tend to focus only on multiple threads on a single application execution and have rarely been adapted to debug for concurrency faults that occur due to shared hardware resources or across different applications in modern software systems. Second, production-run failures due to system-level concurrency faults are difficult to trigger during in-house debugging. As such, existing techniques that analyze both passing runs and failing runs can be ineffective. Third, failure-inducing inputs may not be available for in-house debugging due to users' privacy concerns; this makes most existing approaches on reproducing faults become infeasible. Our two-year research goal is to develop automated approaches to effectively debug system-level concurrency faults. Our research activities will focus on: (i) developing light-weight static analysis techniques that can identify potential failure-inducing program paths; (ii) developing dynamic analysis techniques by using virtulization to effectively reproduce production-run failures that are caused system-level concurrency faults; (iii) implementing an automated debugging framework that provides rich user interfaces; and (iv) evaluating the effectiveness of our solutions on real-world modern computer software systems.
Effective start/end date3/1/152/28/18


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.