The Impact of Automated Parameter Optimization on Defect Prediction Models

Defect prediction models---classifiers that identify defect-prone software modules---have configurable parameters that control their characteristics (e.g., the number of trees in a random forest). Recent studies show that these classifiers underperform when default settings are used. In this paper, we study the impact of automated parameter optimization on defect prediction models. Through a case study of 18 datasets, we find that automated parameter optimization: (1) improves AUC performance by up to 40 percentage points; (2) yields classifiers that are at least as stable as those trained using default settings; (3) substantially shifts the importance ranking of variables, with as few as 28 top-ranked in non-optimized classifiers; (4) yields optimized settings for 17 of the 20 most sensitive parameters that transfer among datasets without a statistically significant drop in performance; and (5) adds less than 30 minutes of additional computation to 12 of the 26 studied classification techniques. While widely-used classification techniques like random forest and support vector machines are not optimization-sensitive, traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied. This highlights the importance of exploring the parameter space when using parameter-sensitive classification techniques.

READ FULL TEXT

page 26

page 32

research
01/31/2018

The Impact of Class Rebalancing Techniques on the Performance and Interpretation of Defect Prediction Models

Defect prediction models that are trained on class imbalanced datasets (...
research
03/20/2017

On the Use of Default Parameter Settings in the Empirical Evaluation of Classification Algorithms

We demonstrate that, for a range of state-of-the-art machine learning al...
research
01/22/2018

Optimizing Prediction Intervals by Tuning Random Forest via Meta-Validation

Recent studies have shown that tuning prediction models increases predic...
research
02/08/2020

Understanding the Automated Parameter Optimization on Transfer Learning for CPDP: An Empirical Study

Data-driven defect prediction has become increasingly important in softw...
research
09/18/2018

Is rotation forest the best classifier for problems with continuous features?

Rotation forest is a tree based ensemble that performs transforms on sub...
research
10/03/2021

Treeging

Treeging combines the flexible mean structure of regression trees with t...
research
08/25/2017

Accurate parameter estimation for Bayesian Network Classifiers using Hierarchical Dirichlet Processes

This paper introduces a novel parameter estimation method for the probab...

Please sign up or login with your details

Forgot password? Click here to reset