V-fold cross-validation improved: V-fold penalization

02/05/2008
by   Sylvain Arlot, et al.
0

We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call "V-fold penalization". Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it "overpenalizes" all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signal-to-noise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called "V-fold penalization" (penVF). It is a V-fold subsampling version of Efron's bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a non-asymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in non-asymptotic situations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2012

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Model selection is a crucial issue in machine-learning and a wide variet...
research
06/18/2021

Local asymptotics of cross-validation in least-squares density estimation

In model selection, several types of cross-validation are commonly used ...
research
11/27/2014

Convex Techniques for Model Selection

We develop a robust convex algorithm to select the regularization parame...
research
02/21/2022

Consistent Cross Validation with stable learners

This paper investigates the efficiency of different cross-validation (CV...
research
09/27/2019

Bootstrap Cross-validation Improves Model Selection in Pharmacometrics

Cross-validation assesses the predictive ability of a model, allowing on...
research
06/19/2018

Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

K-fold cross validation (CV) is a popular method for estimating the true...
research
11/05/2019

Bias-aware model selection for machine learning of doubly robust functionals

While model selection is a well-studied topic in parametric and nonparam...

Please sign up or login with your details

Forgot password? Click here to reset