Bayesian subset selection and variable importance for interpretable prediction and classification

04/20/2021
by   Daniel R. Kowal, et al.
0

Subset selection is a valuable tool for interpretable learning, scientific discovery, and data compression. However, classical subset selection is often eschewed due to selection instability, computational bottlenecks, and lack of post-selection inference. We address these challenges from a Bayesian perspective. Given any Bayesian predictive model ℳ, we elicit predictively-competitive subsets using linear decision analysis. The approach is customizable for (local) prediction or classification and provides interpretable summaries of ℳ. A key quantity is the acceptable family of subsets, which leverages the predictive distribution from ℳ to identify subsets that offer nearly-optimal prediction. The acceptable family spawns new (co-) variable importance metrics based on whether variables (co-) appear in all, some, or no acceptable subsets. Crucially, the linear coefficients for any subset inherit regularization and predictive uncertainty quantification via ℳ. The proposed approach exhibits excellent prediction, interval estimation, and variable selection for simulated data, including p=400 > n. These tools are applied to a large education dataset with highly correlated covariates, where the acceptable family is especially useful. Our analysis provides unique insights into the combination of environmental, socioeconomic, and demographic factors that predict educational outcomes, and features highly competitive prediction with remarkable stability.

READ FULL TEXT

page 21

page 22

page 34

page 37

page 38

research
07/27/2021

Subset selection for linear mixed models

Linear mixed models (LMMs) are instrumental for regression analysis with...
research
06/23/2020

Fast, Optimal, and Targeted Predictions using Parametrized Decision Analysis

Prediction is critical for decision-making under uncertainty and lends v...
research
07/11/2021

Rank-based Bayesian variable selection for genome-wide transcriptomic analyses

Variable selection is crucial in high-dimensional omics-based analyses, ...
research
04/11/2019

FATSO: A family of operators for variable selection in linear models

In linear models it is common to have situations where several regressio...
research
02/21/2014

Important Molecular Descriptors Selection Using Self Tuned Reweighted Sampling Method for Prediction of Antituberculosis Activity

In this paper, a new descriptor selection method for selecting an optima...
research
10/08/2019

On the feasibility of parsimonious variable selection for Hotelling's T^2-test

Hotelling's T^2-test for the mean of a multivariate normal distribution ...

Please sign up or login with your details

Forgot password? Click here to reset