Variable selection using pseudo-variables

04/04/2018
by   Wenhao Hu, et al.
0

Penalized regression has become a standard tool for model building across a wide range of application domains. Common practice is to tune the amount of penalization to tradeoff bias and variance or to optimize some other measure of performance of the estimated model. An advantage of such automated model-building procedures is that their operating characteristics are well-defined, i.e., completely data-driven, and thereby they can be systematically studied. However, in many applications it is desirable to incorporate domain knowledge into the model building process; one way to do this is to characterize each model along the solution path of a penalized regression estimator in terms of an operating characteristic that is meaningful within a domain context and then to allow domain experts to choose from among these models using these operating characteristics as well as other factors not available to the estimation algorithm. We derive an estimator of the false selection rate for each model along the solution path using a novel variable addition method. The proposed estimator applies to both fixed and random designs and allows for p ≫ n. The proposed estimator can be used to estimate a model with a pre-specified false selection rate or can be overlaid on the solution path to facilitate interactive model exploration. We characterize the asymptotic behavior of the proposed estimator in the case of a linear model under a fixed design; however, simulation experiments show that the proposed estimator provides consistently more accurate estimates of the false selection rate than competing methods across a wide range of models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2018

Efficient Predictor Ranking and False Discovery Proportion Control in High-Dimensional Regression

We propose a ranking and selection procedure to prioritize relevant pred...
research
04/20/2018

Variable Selection via Adaptive False Negative Control in High-Dimensional Regression

In high-dimensional regression, variable selection methods have been dev...
research
06/02/2018

Variable Selection for Nonparametric Learning with Power Series Kernels

In this paper, we propose a variable selection method for general nonpar...
research
02/25/2022

Flexible variable selection in the presence of missing data

In many applications, it is of interest to identify a parsimonious set o...
research
02/03/2023

Trade-off between prediction and FDR for high-dimensional Gaussian model selection

In the context of the high-dimensional Gaussian linear regression for or...
research
09/14/2019

Adaptive Bayesian SLOPE – High-dimensional Model Selection with Missing Values

The selection of variables with high-dimensional and missing data is a m...
research
02/01/2020

Higher Criticism Tuned Regression For Weak And Sparse Signals

Here we propose a novel searching scheme for a tuning parameter in high-...

Please sign up or login with your details

Forgot password? Click here to reset