A unified approach for inference on algorithm-agnostic variable importance

by   Brian D. Williamson, et al.

In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response – in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.



There are no comments yet.


page 1

page 2

page 3

page 4


Efficient nonparametric statistical inference on population feature importance using Shapley values

The true population-level importance of a variable in a prediction task ...

Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

We propose strategies to estimate and make inference on key features of ...

Practical Valid Inferences for the Two-Sample Binomial Problem

Consider comparing two independent binomial responses. Our interest is w...

A Targeted Approach to Confounder Selection for High-Dimensional Data

We consider the problem of selecting confounders for adjustment from a p...

Decorrelated Variable Importance

Because of the widespread use of black box prediction methods such as ra...

Valid Inference Corrected for Outlier Removal

Ordinary least square (OLS) estimation of a linear regression model is w...

Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Variable importance is central to scientific studies, including the soci...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.