Linear Regression, Covariate Selection and the Failure of Modelling

12/16/2021
by   Laurie Davies, et al.
0

It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required. The comparison is based on four data sets and two simulations. There is nothing special about these data sets which are often used as examples in the literature. All the model based procedures failed. In the second part of the paper it is argued that the cause of this failure is the very use of a model. If the model involves all the available covariates standard P-values can be used. The use of P-values in this situation is quite straightforward. As soon as the model specifies only some unknown subset of the covariates the problem being to identify this subset the situation changes radically. There are many P-values, they are dependent and most of them are invalid. The Bayesian paradigm also assumes a correct model but although there are no conceptual problems with a large number of covariates there is a considerable overhead causing computational and allocation problems even for moderately sized data sets. The Gaussian covariate procedure is based on P-values which are defined as the probability that a random Gaussian covariate is better than the covariate being considered. These P-values are exact and valid whatever the situation. The allocation requirements and the algorithmic complexity are both linear in the size of the data making the procedure capable of handling large data sets. It outperforms all the other procedures in every respect.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2022

Covariate Selection Based on a Model-free Approach to Linear Regression with Exact Probabilities

In this paper we give a completely new approach to the problem of covari...
research
05/04/2018

Lasso, knockoff and Gaussian covariates: a comparison

Given data y and k covariates x_j one problem in linear regression is to...
research
06/05/2019

A Model-free Approach to Linear Least Squares Regression with Exact Probabilities and Applications to Covariate Selection

The classical model for linear regression is Y= xβ +σε with i.i.d. stan...
research
10/23/2017

Linear regression model with a randomly censored predictor:Estimation procedures

We consider linear regression model estimation where the covariate of in...
research
06/11/2018

Valid Post-selection Inference in Assumption-lean Linear Regression

Construction of valid statistical inference for estimators based on data...
research
11/24/2022

Convergence Analysis of Stochastic Kriging-Assisted Simulation with Random Covariates

We consider performing simulation experiments in the presence of covaria...
research
07/18/2023

Model-free selective inference under covariate shift via weighted conformal p-values

This paper introduces weighted conformal p-values for model-free selecti...

Please sign up or login with your details

Forgot password? Click here to reset