The best way to select features?

05/26/2020
by   Xin Man, et al.
0

Feature selection in machine learning is subject to the intrinsic randomness of the feature selection algorithms (for example, random permutations during MDA). Stability of selected features with respect to such randomness is essential to the human interpretability of a machine learning algorithm. We proposes a rank based stability metric called instability index to compare the stabilities of three feature selection algorithms MDA, LIME, and SHAP as applied to random forests. Typically, features are selected by averaging many random iterations of a selection algorithm. Though we find that the variability of the selected features does decrease as the number of iterations increases, it does not go to zero, and the features selected by the three algorithms do not necessarily converge to the same set. We find LIME and SHAP to be more stable than MDA, and LIME is at least as stable as SHAP for the top ranked features. Hence overall LIME is best suited for human interpretability. However, the selected set of features from all three algorithms significantly improves various predictive metrics out of sample, and their predictive performances do not differ significantly. Experiments were conducted on synthetic datasets, two public benchmark datasets, and on proprietary data from an active investment strategy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2022

Synthetic Data for Feature Selection

Feature selection is an important and active field of research in machin...
research
01/18/2022

Nonparametric Feature Selection by Random Forests and Deep Neural Networks

Random forests are a widely used machine learning algorithm, but their c...
research
01/03/2022

Cluster Stability Selection

Stability selection (Meinshausen and Buhlmann, 2010) makes any feature s...
research
06/18/2020

Leveraging Model Inherent Variable Importance for Stable Online Feature Selection

Feature selection can be a crucial factor in obtaining robust and accura...
research
09/20/2016

Predictive modelling of football injuries

The goal of this thesis is to investigate the potential of predictive mo...
research
09/25/2020

Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features

For data sets with similar features, for example highly correlated featu...
research
06/21/2017

The Theory is Predictive, but is it Complete? An Application to Human Perception of Randomness

When we test a theory using data, it is common to focus on correctness: ...

Please sign up or login with your details

Forgot password? Click here to reset