Comparing interpretability and explainability for feature selection

05/11/2021
by   Jack Dunn, et al.
0

A common approach for feature selection is to examine the variable importance scores for a machine learning model, as a way to understand which features are the most relevant for making predictions. Given the significance of feature selection, it is crucial for the calculated importance scores to reflect reality. Falsely overestimating the importance of irrelevant features can lead to false discoveries, while underestimating importance of relevant features may lead us to discard important features, resulting in poor model performance. Additionally, black-box models like XGBoost provide state-of-the art predictive performance, but cannot be easily understood by humans, and thus we rely on variable importance scores or methods for explainability like SHAP to offer insight into their behavior. In this paper, we investigate the performance of variable importance as a feature selection method across various black-box and interpretable machine learning methods. We compare the ability of CART, Optimal Trees, XGBoost and SHAP to correctly identify the relevant subset of variables across a number of experiments. The results show that regardless of whether we use the native variable importance method or SHAP, XGBoost fails to clearly distinguish between relevant and irrelevant features. On the other hand, the interpretable methods are able to correctly and efficiently identify irrelevant features, and thus offer significantly better performance for feature selection.

READ FULL TEXT
research
09/25/2022

Deep Feature Selection Using a Novel Complementary Feature Mask

Feature selection has drawn much attention over the last decades in mach...
research
06/16/2022

Inherent Inconsistencies of Feature Importance

The black-box nature of modern machine learning techniques invokes a pra...
research
05/23/2019

Computationally Efficient Feature Significance and Importance for Machine Learning Models

We develop a simple and computationally efficient significance test for ...
research
10/31/2019

Sobolev Independence Criterion

We propose the Sobolev Independence Criterion (SIC), an interpretable de...
research
06/08/2015

Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests

It is becoming increasingly important for machine learning methods to ma...
research
06/08/2020

Nonparametric Feature Impact and Importance

Practitioners use feature importance to rank and eliminate weak predicto...
research
07/09/2022

A novel evaluation methodology for supervised Feature Ranking algorithms

Both in the domains of Feature Selection and Interpretable AI, there exi...

Please sign up or login with your details

Forgot password? Click here to reset