High-dimensional regression with potential prior information on variable importance

09/23/2021
by   Benjamin G. Stokell, et al.
0

There are a variety of settings where vague prior information may be available on the importance of predictors in high-dimensional regression settings. Examples include ordering on the variables offered by their empirical variances (which is typically discarded through standardisation), the lag of predictors when fitting autoregressive models in time series settings, or the level of missingness of the variables. Whilst such orderings may not match the true importance of variables, we argue that there is little to be lost, and potentially much to be gained, by using them. We propose a simple scheme involving fitting a sequence of models indicated by the ordering. We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression, and describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models. We propose to select a final estimator by cross-validation and provide a general result on the quality of the best performing estimator on a test set selected from among a number M of competing estimators in a high-dimensional linear regression setting. Our result requires no sparsity assumptions and shows that only a log M price is incurred compared to the unknown best estimator. We demonstrate the effectiveness of our approach when applied to missing or corrupted data, and time series settings. An R package is available on github.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2021

The EAS approach to variable selection for multivariate response data in high-dimensional settings

In this paper, we extend the epsilon admissible subsets (EAS) model sele...
research
05/16/2022

ecpc: An R-package for generic co-data models for high-dimensional prediction

High-dimensional prediction considers data with more variables than samp...
research
02/07/2019

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

For high-dimensional linear regression models, we review and compare sev...
research
11/02/2022

Fast, effective, and coherent time series modeling using the sparsity-ranked lasso

The sparsity-ranked lasso (SRL) has been developed for model selection a...
research
11/18/2022

Modular regression

This paper develops a new framework, called modular regression, to utili...
research
05/03/2021

Ridge Regularized Estimation of VAR Models for Inference and Sieve Approximation

Developments in statistical learning have fueled the analysis of high-di...
research
11/22/2019

On the use of information criteria for subset selection in least squares regression

Least squares (LS) based subset selection methods are popular in linear ...

Please sign up or login with your details

Forgot password? Click here to reset