Flakify: A Black-Box, Language Model-based Predictor for Flaky Tests

12/23/2021
by   Sakina Fatima, et al.
0

Software testing assures that code changes do not adversely affect existing functionality. However, a test case can be flaky, i.e., passing and failing across executions, even for the same version of the source code. Flaky tests introduce overhead to software development as they can lead to unnecessary attempts to debug production or testing code. Besides rerunning test cases multiple times, which is time-consuming and computationally expensive, flaky tests can be predicted using machine learning (ML) models. However, the state-of-the-art ML-based flaky test predictors rely on pre-defined sets of features that are either project-specific, i.e., inapplicable to other projects, or require access to production code, which is not always available to software test engineers. Moreover, given the non-deterministic behavior of flaky tests, it can be challenging to determine a complete set of features that could potentially be associated with test flakiness. Therefore, in this paper, we propose Flakify, a black-box, language model-based predictor for flaky tests. Flakify does not require to (a) rerun test cases, (b) pre-define features, or (c) access to production code. To this end, we employ CodeBERT, a pre-trained language model, and fine-tune it to predict flaky tests by relying exclusively on the source code of test cases. We evaluated Flakify on a publicly available dataset and compared our results with FlakeFlagger, the best state-of-the-art ML-based, white-box predictor for flaky tests. Flakify surpasses FlakeFlagger by 10 and 18 percentage points (pp) in terms of precision and recall, respectively, thus reducing the effort bound to be wasted on unnecessarily debugging test cases and production code by the same percentages, respectively. Our results further show that a black-box version of FlakeFlagger is not a viable option for predicting flaky tests.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

Testing Monotonicity of Machine Learning Models

Today, machine learning (ML) models are increasingly applied in decision...
research
06/07/2019

Artificial Intelligence helps making Quality Assurance processes leaner

Lean processes focus on doing only necessery things in an efficient way....
research
05/16/2019

TERMINATOR: Better Automated UI Test Case Prioritization

Automated UI testing is an important component of the continuous integra...
research
05/27/2023

Query-Efficient Black-Box Red Teaming via Bayesian Optimization

The deployment of large-scale generative models is often restricted by t...
research
04/03/2023

LTM: Scalable and Black-box Similarity-based Test Suite Minimization based on Language Models

Test suite minimization (TSM) is typically used to improve the efficienc...
research
04/05/2021

Automated Performance Testing Based on Active Deep Learning

Generating tests that can reveal performance issues in large and complex...
research
06/21/2023

Black-Box Prediction of Flaky Test Fix Categories Using Language Models

Flaky tests are problematic because they non-deterministically pass or f...

Please sign up or login with your details

Forgot password? Click here to reset