Causality-Guided Adaptive Interventional Debugging

03/21/2020
by   Anna Fariha, et al.
0

Runtime nondeterminism is a fact of life in modern database applications. Previous research has shown that nondeterminism can cause applications to intermittently crash, become unresponsive, or experience data corruption. We propose Adaptive Interventional Debugging (AID) for debugging such intermittent failures. AID combines existing statistical debugging, causal analysis, fault injection, and group testing techniques in a novel way to (1) pinpoint the root cause of an application's intermittent failure and (2) generate an explanation of how the root cause triggers the failure. AID works by first identifying a set of runtime behaviors (called predicates) that are strongly correlated to the failure. It then utilizes temporal properties of the predicates to (over)-approximate their causal relationships. Finally, it uses fault injection to execute a sequence of interventions on the predicates and discover their true causal relationships. This enables AID to identify the true root cause and its causal relationship to the failure. We theoretically analyze how fast AID can converge to the identification. We evaluate AID with six real-world applications that intermittently fail under specific inputs. In each case, AID was able to identify the root cause and explain how the root cause triggered the failure, much faster than group testing and more precisely than statistical debugging. We also evaluate AID with many synthetically generated applications with known root causes and confirm that the benefits also hold for them.

READ FULL TEXT
research
09/19/2018

Causal Testing: Finding Defects' Root Causes

Isolating and repairing unexpected or buggy software behavior typically ...
research
05/13/2021

DataExposer: Exposing Disconnect between Data and Systems

As data is a central component of many modern systems, the cause of a sy...
research
11/18/2015

Using Abduction in Markov Logic Networks for Root Cause Analysis

IT infrastructure is a crucial part in most of today's business operatio...
research
10/16/2016

Fault Detection Engine in Intelligent Predictive Analytics Platform for DCIM

With the advancement of huge data generation and data handling capabilit...
research
07/18/2022

PerfCE: Performance Debugging on Databases with Chaos Engineering-Enhanced Causality Analysis

Debugging performance anomalies in real-world databases is challenging. ...
research
03/09/2023

RCABench: Open Benchmarking Platform for Root Cause Analysis

Fuzzing has contributed to automatically identifying bugs and vulnerabil...
research
01/09/2023

Making Sense of Failure Logs in an Industrial DevOps Environment

Processing and reviewing nightly test execution failure logs for large i...

Please sign up or login with your details

Forgot password? Click here to reset