Toward a consistent performance evaluation for defect prediction models

02/01/2023
by   Xutong Liu, et al.
0

In defect prediction community, many defect prediction models have been proposed and indeed more new models are continuously being developed. However, there is no consensus on how to evaluate the performance of a newly proposed model. In this paper, we aim to propose MATTER, a fraMework towArd a consisTenT pErformance compaRison, which makes model performance directly comparable across different studies. We take three actions to build a consistent evaluation framework for defect prediction models. First, we propose a simple and easy-to-use unsupervised baseline model ONE (glObal baseliNe modEl) to provide "a single point of comparison". Second, we propose using the SQA-effort-aligned threshold setting to make a fair comparison. Third, we suggest reporting the evaluation results in a unified way and provide a set of core performance indicators for this purpose, thus enabling an across-study comparison to attain real progress. The experimental results show that MATTER can serve as an effective framework to support a consistent performance evaluation for defect prediction models and hence can help determine whether a newly proposed defect prediction model is practically useful for practitioners and inform the real progress in the road of defect prediction. Furthermore, when applying MATTER to evaluate the representative defect prediction models proposed in recent years, we find that most of them (if not all) are not superior to the simple baseline model ONE in terms of the SQA-effort awareness prediction performance. This reveals that the real progress in defect prediction has been overestimated. We hence recommend that, in future studies, when any new defect prediction model is proposed, MATTER should be used to evaluate its actual usefulness (on the same benchmark test data sets) to advance scientific progress in defect prediction.

READ FULL TEXT
research
02/24/2021

Practitioners' Perceptions of the Goals and Visual Explanations of Defect Prediction Models

Software defect prediction models are classifiers that are constructed f...
research
08/24/2023

Assessing model performance for counterfactual predictions

Counterfactual prediction methods are required when a model will be depl...
research
09/18/2023

A performance characteristic curve for model evaluation: the application in information diffusion prediction

The information diffusion prediction on social networks aims to predict ...
research
03/24/2022

The Dutch Draw: Constructing a Universal Baseline for Binary Prediction Models

Novel prediction methods should always be compared to a baseline to know...
research
04/07/2023

A roadmap to fair and trustworthy prediction model validation in healthcare

A prediction model is most useful if it generalizes beyond the developme...
research
07/01/2020

Data-Driven Method for Enhanced Corrosion Assessment of Reinforced Concrete Structures

Corrosion is a major problem affecting the durability of reinforced conc...
research
01/12/2021

A Unified Framework for Online Trip Destination Prediction

Trip destination prediction is an area of increasing importance in many ...

Please sign up or login with your details

Forgot password? Click here to reset