AutoSpearman: Automatically Mitigating Correlated Metrics for Interpreting Defect Models

06/26/2018
by   Jirayus Jiarpakdee, et al.
0

The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated to defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2 interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.

READ FULL TEXT
research
01/31/2018

The Impact of Correlated Metrics on Defect Models

Defect models are analytical models that are used to build empirical the...
research
07/12/2018

The Impact of Feature Selection on Predicting the Number of Bugs

Bug prediction is the process of training a machine learning model on so...
research
11/05/2021

Automated Supervised Feature Selection for Differentiated Patterns of Care

An automated feature selection pipeline was developed using several stat...
research
08/02/2019

FeatureExplorer: Interactive Feature Selection and Exploration of Regression Models for Hyperspectral Images

Feature selection is used in machine learning to improve predictions, de...
research
12/06/2017

Sparsity Regularization for classification of large dimensional data

Feature selection has evolved to be a very important step in several mac...
research
06/09/2021

Quantum Annealing for Automated Feature Selection in Stress Detection

We present a novel methodology for automated feature subset selection fr...
research
06/09/2020

Adversarial Infidelity Learning for Model Interpretation

Model interpretation is essential in data mining and knowledge discovery...

Please sign up or login with your details

Forgot password? Click here to reset