The Importance of Discerning Flaky from Fault-triggering Test Failures: A Case Study on the Chromium CI

02/21/2023
by   Guillaume Haben, et al.
0

Flaky tests are tests that pass and fail on different executions of the same version of a program under test. They waste valuable developer time by making developers investigate false alerts (flaky test failures). To deal with this problem, many prediction methods that identify flaky tests have been proposed. While promising, the actual utility of these methods remains unclear since they have not been evaluated within a continuous integration (CI) process. In particular, it remains unclear what is the impact of missed faults, i.e., the consideration of fault-triggering test failures as flaky, at different CI cycles. To fill this gap, we apply state-of-the-art flakiness prediction methods at the Chromium CI and check their performance. Perhaps surprisingly, we find that, despite the high precision (99.2 application leads to numerous faults missed, approximately 76.2 regression faults. To explain this result, we analyse the fault-triggering failures and show that flaky tests have a strong fault-revealing capability, i.e., they reveal more than 1/3 of all regression faults, indicating an inherent limitation of all methods focusing on identifying flaky tests, instead of flaky test failures. Going a step further, we build failure-focused prediction methods and optimize them by considering new features. Interestingly, we find that these methods perform better than the test-focused ones, with an MCC increasing from 0.20 to 0.42. Overall, our findings imply that on the one hand future research should focus on predicting flaky test failures instead of flaky tests and the need for adopting more thorough experimental methodologies when evaluating flakiness prediction methods, on the other.

READ FULL TEXT
research
01/03/2022

Exception-Driven Fault Localization for Automated Program Repair

Automated Program Repair (APR) techniques typically exploit spectrum-bas...
research
11/05/2021

Discerning Legitimate Failures From False Alerts: A Study of Chromium's Continuous Integration

Flakiness is a major concern in Software testing. Flaky tests pass and f...
research
08/07/2017

VART: A Tool for the Automatic Detection of Regression Faults

In this paper we present VART, a tool for automatically revealing regres...
research
07/02/2019

Sample Adaptive Multiple Kernel Learning for Failure Prediction of Railway Points

Railway points are among the key components of railway infrastructure. A...
research
08/10/2021

Searching for Multi-Fault Programs in Defects4J

Defects4J has enabled numerous software testing and debugging research w...
research
03/15/2019

BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes

Fault-detection, localization, and repair methods are vital to software ...
research
12/12/2017

OpenSEA: Semi-Formal Methods for Soft Error Analysis

Alpha-particles and cosmic rays cause bit flips in chips. Protection cir...

Please sign up or login with your details

Forgot password? Click here to reset