Moving from Cross-Project Defect Prediction to Heterogeneous Defect Prediction: A Partial Replication Study

by   Hadi Jahanshahi, et al.

Software defect prediction heavily relies on the metrics collected from software projects. Earlier studies often used machine learning techniques to build, validate, and improve bug prediction models using either a set of metrics collected within a project or across different projects. However, techniques applied and conclusions derived by those models are restricted by how identical those metrics are. Knowledge coming from those models will not be extensible to a target project if no sufficient overlapping metrics have been collected in the source projects. To explore the feasibility of transferring knowledge across projects without common labeled metrics, we systematically integrated Heterogeneous Defect Prediction (HDP) by replicating and validating the obtained results. Our main goal is to extend prior research and explore the feasibility of HDP and finally to compare its performance with that of its predecessor, Cross-Project Defect Prediction. We construct an HDP model on different publicly available datasets. Moreover, we propose a new ensemble voting approach in the HDP context to utilize the predictive power of multiple available datasets. The result of our experiment is comparable to that of the original study. However, we also explored the feasibility of HDP in real cases. Our results shed light on the infeasibility of many cases for the HDP algorithm due to its sensitivity to the parameter selection. In general, our analysis gives a deep insight into why and how to perform transfer learning from one domain to another, and in particular, provides a set of guidelines to help researchers and practitioners to disseminate knowledge to the defect prediction domain.


page 1

page 2

page 3

page 4


Feature-Oriented Defect Prediction: Scenarios, Metrics, and Classifiers

Several software defect prediction techniques have been developed over t...

Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

Software is highly contextual. While there are cross-cutting `global' le...

Iterative versus Exhaustive Data Selection for Cross Project Defect Prediction: An Extended Replication Study

Context: The effectiveness of data selection approaches in improving the...

An empirical study of public data quality problems in cross project defect prediction

Background: Two public defect data, including Jureczko and NASA datasets...

Evaluating software defect prediction performance: an updated benchmarking study

Accurately predicting faulty software units helps practitioners target f...

Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction

Software dependency network metrics extracted from the dependency graph ...

Please sign up or login with your details

Forgot password? Click here to reset