Inferring feature importance with uncertainties in high-dimensional data

09/02/2021
by   Pål Vegard Johnsen, et al.
0

Estimating feature importance is a significant aspect of explaining data-based models. Besides explaining the model itself, an equally relevant question is which features are important in the underlying data generating process. We present a Shapley value based framework for inferring the importance of individual features, including uncertainty in the estimator. We build upon the recently published feature importance measure of SAGE (Shapley additive global importance) and introduce sub-SAGE which can be estimated without resampling for tree-based models. We argue that the uncertainties can be estimated from bootstrapping and demonstrate the approach for tree ensemble methods. The framework is exemplified on synthetic data as well as high-dimensional genomics data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2020

Generalizing Gain Penalization for Feature Selection in Tree-based Models

We develop a new approach for feature selection via gain penalization in...
research
09/30/2021

On the Trustworthiness of Tree Ensemble Explainability Methods

The recent increase in the deployment of machine learning models in crit...
research
04/07/2021

Hollow-tree Super: a directional and scalable approach for feature importance in boosted tree models

Current limitations in boosted tree modelling prevent the effective scal...
research
10/01/2019

Randomized Ablation Feature Importance

Given a model f that predicts a target y from a vector of input features...
research
02/11/2020

Feature Importance Estimation with Self-Attention Networks

Black-box neural network models are widely used in industry and science,...
research
07/21/2016

Explaining Classification Models Built on High-Dimensional Sparse Data

Predictive modeling applications increasingly use data representing peop...

Please sign up or login with your details

Forgot password? Click here to reset