The Untold Impact of Learning Approaches on Software Fault-Proneness Predictions

07/12/2022
by   Mohammad Jamil Ahmad, et al.
0

Software fault-proneness prediction is an active research area, with many factors affecting prediction performance extensively studied. However, the impact of the learning approach (i.e., the specifics of the data used for training and the target variable being predicted) on the prediction performance has not been studied, except for one initial work. This paper explores the effects of two learning approaches, useAllPredictAll and usePrePredictPost, on the performance of software fault-proneness prediction, both within-release and across-releases. The empirical results are based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on the classification performance. Specifically, using useAllPredictAll leads to significantly better performance than using usePrePredictPost learning approach, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, this difference in classification performance is due to different levels of class imbalance in the two learning approaches. When class imbalance is addressed, the performance difference between the learning approaches is eliminated. Our findings imply that the learning approach should always be explicitly identified and its impact on software fault-proneness prediction considered. The paper concludes with a discussion of potential consequences of our results for both research and practice.

READ FULL TEXT
research
06/16/2022

An Empirical Study on the Effectiveness of Data Resampling Approaches for Cross-Project Software Defect Prediction

Crossp-roject defect prediction (CPDP), where data from different softwa...
research
04/29/2021

Machine Learning Techniques for Software Quality Assurance: A Survey

Over the last years, machine learning techniques have been applied to mo...
research
04/28/2023

Does Code Smell Frequency Have a Relationship with Fault-proneness?

Fault-proneness is an indication of programming errors that decreases so...
research
06/13/2020

Analyzing the Impact of Foursquare and Streetlight Data with Human Demographics on Future Crime Prediction

Finding the factors contributing to criminal activities and their conseq...
research
05/03/2018

Poster: Identification of Methods with Low Fault Risk

Test resources are usually limited and therefore it is often not possibl...
research
02/24/2023

A Machine Learning Approach for Hierarchical Classification of Software Requirements

Context: Classification of software requirements into different categori...
research
04/05/2021

Predicting Crash Fault Residence via Simplified Deep Forest Based on A Reduced Feature Set

The software inevitably encounters the crash, which will take developers...

Please sign up or login with your details

Forgot password? Click here to reset