The Impact of Automated Parameter Optimization on Defect Prediction Models

Defect prediction models---classifiers that identify defect-prone software modules---have configurable parameters that control their characteristics (e.g., the number of trees in a random forest). Recent studies show that these classifiers underperform when default settings are used. In this paper, we study the impact of automated parameter optimization on defect prediction models. Through a case study of 18 datasets, we find that automated parameter optimization: (1) improves AUC performance by up to 40 percentage points; (2) yields classifiers that are at least as stable as those trained using default settings; (3) substantially shifts the importance ranking of variables, with as few as 28 top-ranked in non-optimized classifiers; (4) yields optimized settings for 17 of the 20 most sensitive parameters that transfer among datasets without a statistically significant drop in performance; and (5) adds less than 30 minutes of additional computation to 12 of the 26 studied classification techniques. While widely-used classification techniques like random forest and support vector machines are not optimization-sensitive, traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied. This highlights the importance of exploring the parameter space when using parameter-sensitive classification techniques.


page 26

page 32


The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Defect prediction models that are trained on class imbalanced datasets (...

On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms

We demonstrate that, for a range of state-of-the-art machine learning al...

Optimizing Prediction Intervals by Tuning Random Forest via Meta-Validation

Recent studies have shown that tuning prediction models increases predic...

Understanding the Automated Parameter Optimization on Transfer Learning for CPDP: An Empirical Study

Data-driven defect prediction has become increasingly important in softw...

Is rotation forest the best classifier for problems with continuous features?

Rotation forest is a tree based ensemble that performs transforms on sub...


Treeging combines the flexible mean structure of regression trees with t...

Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes

This paper introduces a novel parameter estimation method for the probab...

Please sign up or login with your details

Forgot password? Click here to reset