Statistically Valid Variable Importance Assessment through Conditional Permutations

09/14/2023
by   Ahmad Chamma, et al.
0

Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that CPI overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, CPI consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that CPI provides a more parsimonious selection of statistically significant variables. Our results suggest that CPI can be readily used as drop-in replacement for permutation-based methods.

READ FULL TEXT
research
05/28/2021

Generalized Permutation Framework for Testing Model Variable Significance

A common problem in machine learning is determining if a variable signif...
research
10/06/2022

Conditional Feature Importance for Mixed Data

Despite the popularity of feature importance measures in interpretable m...
research
12/05/2019

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

Variable selection in sparse regression models is an important task as a...
research
08/17/2021

Improving Accuracy of Permutation DAG Search using Best Order Score Search

The Sparsest Permutation (SP) algorithm is accurate but limited to about...
research
04/07/2020

A unified approach for inference on algorithm-agnostic variable importance

In many applications, it is of interest to assess the relative contribut...
research
07/19/2022

Lazy Estimation of Variable Importance for Large Neural Networks

As opaque predictive models increasingly impact many areas of modern lif...
research
07/13/2020

Exclusion and Inclusion – A model agnostic approach to feature importance in DNNs

Deep Neural Networks in NLP have enabled systems to learn complex non-li...

Please sign up or login with your details

Forgot password? Click here to reset