Moving from Cross-Project Defect Prediction to Heterogeneous Defect Prediction: A Partial Replication Study

03/05/2021
by   Hadi Jahanshahi, et al.
0

Software defect prediction heavily relies on the metrics collected from software projects. Earlier studies often used machine learning techniques to build, validate, and improve bug prediction models using either a set of metrics collected within a project or across different projects. However, techniques applied and conclusions derived by those models are restricted by how identical those metrics are. Knowledge coming from those models will not be extensible to a target project if no sufficient overlapping metrics have been collected in the source projects. To explore the feasibility of transferring knowledge across projects without common labeled metrics, we systematically integrated Heterogeneous Defect Prediction (HDP) by replicating and validating the obtained results. Our main goal is to extend prior research and explore the feasibility of HDP and finally to compare its performance with that of its predecessor, Cross-Project Defect Prediction. We construct an HDP model on different publicly available datasets. Moreover, we propose a new ensemble voting approach in the HDP context to utilize the predictive power of multiple available datasets. The result of our experiment is comparable to that of the original study. However, we also explored the feasibility of HDP in real cases. Our results shed light on the infeasibility of many cases for the HDP algorithm due to its sensitivity to the parameter selection. In general, our analysis gives a deep insight into why and how to perform transfer learning from one domain to another, and in particular, provides a set of guidelines to help researchers and practitioners to disseminate knowledge to the defect prediction domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2021

Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Several software defect prediction techniques have been developed over t...
research
04/06/2018

Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

Software is highly contextual. While there are cross-cutting `global' le...
research
09/11/2019

Iterative versus Exhaustive Data Selection for Cross Project Defect Prediction: An Extended Replication Study

Context: The effectiveness of data selection approaches in improving the...
research
05/28/2018

An empirical study of public data quality problems in cross project defect prediction

Background: Two public defect data, including Jureczko and NASA datasets...
research
01/07/2019

Evaluating software defect prediction performance: an updated benchmarking study

Accurately predicting faulty software units helps practitioners target f...
research
02/12/2022

Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction

Software dependency network metrics extracted from the dependency graph ...

Please sign up or login with your details

Forgot password? Click here to reset