Total Variation Floodgate for Variable Importance Inference in Classification

09/07/2023
by   Wenshuo Wang, et al.
0

Inferring variable importance is the key problem of many scientific studies, where researchers seek to learn the effect of a feature X on the outcome Y in the presence of confounding variables Z. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model context. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. These algorithms build on the floodgate notion for regression problems (Zhang and Janson 2020). The algorithms we introduce can leverage any user-specified regression function and produce asymptotic lower confidence bounds for the ETV. We show the effectiveness of our algorithms with simulations and a case study in conjoint analysis on the US general election.

READ FULL TEXT
research
06/05/2013

Multiclass Total Variation Clustering

Ideas from the image processing literature have recently motivated a new...
research
07/02/2020

Floodgate: inference for model-free variable importance

Many modern applications seek to understand the relationship between an ...
research
02/08/2018

Statistical Learnability of Generalized Additive Models based on Total Variation Regularization

A generalized additive model (GAM, Hastie and Tibshirani (1987)) is a no...
research
06/16/2020

Efficient nonparametric statistical inference on population feature importance using Shapley values

The true population-level importance of a variable in a prediction task ...
research
12/14/2022

Total variation distance between a jump-equation and its Gaussian approximation

We deal with stochastic differential equations with jumps. In order to o...
research
05/12/2020

High Probability Lower Bounds for the Total Variation Distance

The statistics and machine learning communities have recently seen a gro...
research
07/31/2023

Lossless Transformations and Excess Risk Bounds in Statistical Inference

We study the excess minimum risk in statistical inference, defined as th...

Please sign up or login with your details

Forgot password? Click here to reset