A unified approach for inference on algorithm-agnostic variable importance

04/07/2020
by   Brian D. Williamson, et al.
0

In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response – in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2020

Efficient nonparametric statistical inference on population feature importance using Shapley values

The true population-level importance of a variable in a prediction task ...
research
06/05/2022

Inference for Interpretable Machine Learning: Fast, Model-Agnostic Confidence Intervals for Feature Importance

In order to trust machine learning for high-stakes problems, we need mod...
research
11/23/2022

Shapley Curves: A Smoothing Perspective

Originating from cooperative game theory, Shapley values have become one...
research
06/29/2023

Zipper: Addressing degeneracy in algorithm-agnostic inference

The widespread use of black box prediction methods has sparked an increa...
research
12/13/2017

Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

We propose strategies to estimate and make inference on key features of ...
research
09/14/2023

Statistically Valid Variable Importance Assessment through Conditional Permutations

Variable importance assessment has become a crucial step in machine-lear...
research
11/21/2021

Decorrelated Variable Importance

Because of the widespread use of black box prediction methods such as ra...

Please sign up or login with your details

Forgot password? Click here to reset