The impact of using biased performance metrics on software defect prediction research

03/18/2021
by   Jingxiu Yao, et al.
0

Context: Software engineering researchers have undertaken many experiments investigating the potential of software defect prediction algorithms. Unfortunately, some widely used performance metrics are known to be problematic, most notably F1, but nevertheless F1 is widely used. Objective: To investigate the potential impact of using F1 on the validity of this large body of research. Method: We undertook a systematic review to locate relevant experiments and then extract all pairwise comparisons of defect prediction performance using F1 and the un-biased Matthews correlation coefficient (MCC). Results: We found a total of 38 primary studies. These contain 12,471 pairs of results. Of these, 21.95 instead of the biased F1 metric. Unfortunately, we also found evidence suggesting that F1 remains widely used in software defect prediction research. Conclusions: We reiterate the concerns of statisticians that the F1 is a problematic metric outside of an information retrieval context, since we are concerned about both classes (defect-prone and not defect-prone units). This inappropriate usage has led to a substantial number (more than one fifth) of erroneous (in terms of direction) results. Therefore we urge researchers to (i) use an unbiased metric and (ii) publish detailed results including confusion matrices such that alternative analyses become possible.

READ FULL TEXT

page 7

page 13

page 19

research
03/02/2020

Assessing Software Defection Prediction Performance: Why Using the Matthews Correlation Coefficient Matters

Context: There is considerable diversity in the range and design of comp...
research
06/22/2022

Defect Prediction Using Stylistic Metrics

Defect prediction is one of the most popular research topics due to its ...
research
06/08/2021

Does class size matter? An in-depth assessment of the effect of class size in software defect prediction

In the past 20 years, defect prediction studies have generally acknowled...
research
02/13/2023

A Systematic Literature Review of Explainable AI for Software Engineering

Context: In recent years, leveraging machine learning (ML) techniques ha...
research
10/12/2021

Fast Static Analyses of Software Product Lines – An Example With More Than 42,000 Metrics

Context: Software metrics, as one form of static analyses, is a commonly...
research
05/28/2018

An empirical study of public data quality problems in cross project defect prediction

Background: Two public defect data, including Jureczko and NASA datasets...
research
01/14/2021

Evaluating prediction systems in software project estimation

Context: Software engineering has a problem in that when we empirically ...

Please sign up or login with your details

Forgot password? Click here to reset