Employing Partial Least Squares Regression with Discriminant Analysis for Bug Prediction

11/02/2020
by   Rudolf Ferenc, et al.
0

Forecasting defect proneness of source code has long been a major research concern. Having an estimation of those parts of a software system that most likely contain bugs may help focus testing efforts, reduce costs, and improve product quality. Many prediction models and approaches have been introduced during the past decades that try to forecast bugged code elements based on static source code metrics, change and history metrics, or both. However, there is still no universal best solution to this problem, as most suitable features and models vary from dataset to dataset and depend on the context in which we use them. Therefore, novel approaches and further studies on this topic are highly necessary. In this paper, we employ a chemometric approach - Partial Least Squares with Discriminant Analysis (PLS-DA) - for predicting bug prone Classes in Java programs using static source code metrics. To our best knowledge, PLS-DA has never been used before as a statistical approach in the software maintenance domain for predicting software errors. In addition, we have used rigorous statistical treatments including bootstrap resampling and randomization (permutation) test, and evaluation for representing the software engineering results. We show that our PLS-DA based prediction model achieves superior performances compared to the state-of-the-art approaches (i.e. F-measure of 0.44-0.47 at 90 applied and comparable to others when applying up-sampling on the largest open bug dataset, while training the model is significantly faster, thus finding optimal parameters is much easier. In terms of completeness, which measures the amount of bugs contained in the Java Classes predicted to be defective, PLS-DA outperforms every other algorithm: it found 69.3 with no re-sampling and up-sampling, respectively.

READ FULL TEXT

page 13

page 18

page 19

page 20

research
09/06/2023

Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

In the past couple of decades, significant research efforts are devoted ...
research
06/26/2022

An Empirical Study on Bug Severity Estimation Using Source Code Metrics and Static Analysis

In the past couple of decades, significant research efforts are devoted ...
research
05/16/2023

Applying Machine Learning Analysis for Software Quality Test

One of the biggest expense in software development is the maintenance. T...
research
12/21/2017

A Comparative Study of Different Source Code Metrics and Machine Learning Algorithms for Predicting Change Proneness of Object Oriented Systems

Change-prone classes or modules are defined as software components in th...
research
12/15/2018

A Large-Scale Study of Call Graph-based Impact Prediction using Mutation Testing

In software engineering, impact analysis involves predicting the softwar...
research
04/24/2021

Predicting the Number of Reported Bugs in a Software Repository

The bug growth pattern prediction is a complicated, unrelieved task, whi...
research
09/25/2021

Constructing Regression Dataset from Code Evolution History

Bug datasets consisting of real-world bugs are important artifacts for r...

Please sign up or login with your details

Forgot password? Click here to reset