Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

10/30/2010
by   Matthieu CORNEC, et al.
0

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanity-check bounds in the spirit of KR99“bounds showing that the worst-case error of this estimate is not much worse that of training error estimate” . General loss functions and class of predictors with finite VC-dimension are considered. We closely follow the formalism introduced by DUD03 to cover a large variety of cross-validation procedures including leave-one-out cross-validation, k cross-validation (or split sample), and the leave-υ-out cross-validation. In particular, we focus on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rate of convergence. An estimation curve with transition phases depending on the cross-validation procedure and not only on the percentage of observations in the test sample gives a simple rule on how to choose the cross-validation. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2010

Concentration inequalities of the cross-validation estimate for stable predictors

In this article, we derive concentration inequalities for the cross-vali...
research
11/04/2022

Concentration inequalities for leave-one-out cross validation

In this article we prove that estimator stability is enough to show that...
research
02/01/2022

Cross Validation for Rare Events

We derive sanity-check bounds for the cross-validation (CV) estimate of ...
research
03/14/2018

How to evaluate sentiment classifiers for Twitter time-ordered data?

Social media are becoming an increasingly important source of informatio...
research
12/27/2019

Statistical Agnostic Mapping: a Framework in Neuroimaging based on Concentration Inequalities

In the 70s a novel branch of statistics emerged focusing its effort in s...
research
02/05/2019

Consistent Risk Estimation in High-Dimensional Linear Regression

Risk estimation is at the core of many learning systems. The importance ...
research
05/25/2022

Mitigating multiple descents: A model-agnostic framework for risk monotonization

Recent empirical and theoretical analyses of several commonly used predi...

Please sign up or login with your details

Forgot password? Click here to reset