Correction of overfitting bias in regression models

04/12/2022
by   Emanuele Massa, et al.
0

Regression analysis based on many covariates is becoming increasingly common. When the number p of covariates is of the same order as the number N of observations, statistical inference like maximum likelihood estimation of regression and nuisance parameters in the model becomes unreliable due to overfitting. This overfitting most often leads to systematic biases in (some of) the estimators. In the literature, several methods to overcome overfitting bias or to adjust estimates have been proposed. The vast majority of these focus on the regression parameters only, either via empirical regularization methods or by expansion for small ratios p/N. This failure to correctly estimate also the nuisance parameters may lead to significant errors in outcome predictions. In this paper we study the overfitting bias of maximum likelihood estimators for regression and the nuisance parameters in parametric regression models in the overfitting regime (p/N<1). We compute the asymptotic characteristic function of the maximum likelihood estimators, and show how it can be used to estimate their overfitting biases by solving a small set of non-linear equations. These estimated biases enable us to correct the estimators and make them asymptotically unbiased. To illustrate the theory we performed simulation studies for multiple parametric regression models. In all cases we find excellent agreement between theory and simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2020

Replica analysis of overfitting in generalized linear models

Nearly all statistical inference methods were developed for the regime w...
research
11/21/2019

On comparison of estimators for proportional error nonlinear regression models in the limit of small measurement error

In this paper, we compare maximum likelihood (ML), quasi likelihood (QL)...
research
09/22/2022

Robust beta regression through the logit transformation

Beta regression models are employed to model continuous response variabl...
research
04/14/2019

Analysis of overfitting in the regularized Cox model

The Cox proportional hazards model is ubiquitous in the analysis of time...
research
01/18/2021

Bias Reduction as a Remedy to the Consequences of Infinite Estimates in Poisson and Tobit Regression

Data separation is a well-studied phenomenon that can cause problems in ...
research
01/22/2021

Increasing Cluster Size Asymptotics for Nested Error Regression Models

This paper establishes asymptotic results for the maximum likelihood and...
research
08/21/2022

On regression analysis with Padé approximants

The advantages and difficulties of application of Padé approximants to t...

Please sign up or login with your details

Forgot password? Click here to reset