Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions

03/03/2020
by   Kamiar Rahnama Rad, et al.
0

We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size n and number of features p are large, and n/p can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as n,p →∞, even when the dimension p of the feature vectors is comparable with or greater than the sample size n. One technical advantage of the theory is that it can be used to clarify and connect some results from the recent literature on scalable approximate LO.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2019

Consistent Risk Estimation in High-Dimensional Linear Regression

Risk estimation is at the core of many learning systems. The importance ...
research
05/31/2019

Sparse Approximate Cross-Validation for High-Dimensional GLMs

Leave-one-out cross validation (LOOCV) can be particularly accurate amon...
research
01/30/2018

A scalable estimate of the extra-sample prediction error via approximate leave-one-out

We propose a scalable closed-form formula (ALO_λ) to estimate the extra-...
research
03/22/2021

A Link between Coding Theory and Cross-Validation with Applications

We study the combinatorics of cross-validation based AUC estimation unde...
research
12/27/2019

Statistical Agnostic Mapping: a Framework in Neuroimaging based on Concentration Inequalities

In the 70s a novel branch of statistics emerged focusing its effort in s...
research
02/27/2023

Extrapolated cross-validation for randomized ensembles

Ensemble methods such as bagging and random forests are ubiquitous in fi...
research
10/18/2016

Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

We study model evaluation and model selection from the perspective of ge...

Please sign up or login with your details

Forgot password? Click here to reset