Assessing Software Defection Prediction Performance: Why Using the Matthews Correlation Coefficient Matters

03/02/2020
by   Jingxiu Yao, et al.
0

Context: There is considerable diversity in the range and design of computational experiments to assess classifiers for software defect prediction. This is particularly so, regarding the choice of classifier performance metrics. Unfortunately some widely used metrics are known to be biased, in particular F1. Objective: We want to understand the extent to which the widespread use of the F1 renders empirical results in software defect prediction unreliable. Method: We searched for defect prediction studies that report both F1 and the Matthews correlation coefficient (MCC). This enabled us to determine the proportion of results that are consistent between both metrics and the proportion that change. Results: Our systematic review identifies 8 studies comprising 4017 pairwise results. Of these results, the direction of the comparison changes in 23 employed. Conclusion: We find compelling reasons why the choice of classification performance metric matters, specifically the biased and misleading F1 metric should be deprecated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2021

The impact of using biased performance metrics on software defect prediction research

Context: Software engineering researchers have undertaken many experimen...
research
07/28/2019

A Systematic Review of Unsupervised Learning Techniques for Software Defect Prediction

Background: Unsupervised machine learners have been increasingly applied...
research
06/22/2022

Defect Prediction Using Stylistic Metrics

Defect prediction is one of the most popular research topics due to its ...
research
12/02/2017

The impact of software complexity on cost and quality - A comparative analysis between Open source and proprietary software

Early prediction of software quality is important for better software pl...
research
02/01/2018

Correlation and Prediction of Evaluation Metrics in Information Retrieval

Because researchers typically do not have the time or space to present m...
research
10/09/2020

Mark-Evaluate: Assessing Language Generation using Population Estimation Methods

We propose a family of metrics to assess language generation derived fro...
research
04/26/2021

Revisiting the size effect in software fault prediction models

BACKGROUND: In object oriented (OO) software systems, class size has bee...

Please sign up or login with your details

Forgot password? Click here to reset