Test suite effectiveness metric evaluation: what do we know and what should we do?

04/19/2022
by   Peng Zhang, et al.
0

Comparing test suite effectiveness metrics has always been a research hotspot. However, prior studies have different conclusions or even contradict each other for comparing different test suite effectiveness metrics. The problem we found most troubling to our community is that researchers tend to oversimplify the description of the ground truth they use. For example, a common expression is that "we studied the correlation between real faults and the metric to evaluate (MTE)". However, the meaning of "real faults" is not clear-cut. As a result, there is a need to scrutinize the meaning of "real faults". Without this, it will be half-knowledgeable with the conclusions. To tackle this challenge, we propose a framework ASSENT (evAluating teSt Suite EffectiveNess meTrics) to guide the follow-up research. In nature, ASSENT consists of three fundamental components: ground truth, benchmark test suites, and agreement indicator. First, materialize the ground truth for determining the real order in effectiveness among test suites. Second, generate a set of benchmark test suites and derive their ground truth order in effectiveness. Third, for the benchmark test suites, generate the MTE order in effectiveness by the metric to evaluate (MTE). Finally, calculate the agreement indicator between the two orders. Under ASSENT, we are able to compare the accuracy of different test suite effectiveness metrics. We apply ASSENT to evaluate representative test suite effectiveness metrics, including mutation score metrics and code coverage metrics. Our results show that, based on the real faults, mutation score and subsuming mutation score are the best metrics to quantify test suite effectiveness. Meanwhile, by using mutants instead of real faults, MTEs will be overestimated by more than 20

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2020

DeepMutation: A Neural Mutation Tool

Mutation testing can be used to assess the fault-detection capabilities ...
research
02/05/2021

Mutant reduction evaluation: what is there and what is missing?

Background. Many mutation reduction strategies, which aim to reduce the ...
research
03/12/2021

Does mutation testing improve testing practices?

Various proxy metrics for test quality have been defined in order to gui...
research
03/13/2019

Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?

Mutation testing is a means to assess the effectiveness of a test suite ...
research
09/02/2020

Magma: A Ground-Truth Fuzzing Benchmark

High scalability and low running costs have made fuzz testing the de fac...
research
12/12/2022

A Brief Survey on Oracle-based Test Adequacy Metrics

Even though code coverage is a widespread and popular test adequacy metr...
research
03/17/2023

Using causal inference and Bayesian statistics to explain the capability of a test suite in exposing software faults

Test effectiveness refers to the capability of a test suite in exposing ...

Please sign up or login with your details

Forgot password? Click here to reset