( β, ϖ)-stability for cross-validation and the choice of the number of folds

05/20/2017
by   Ning Xu, et al.
0

In this paper, we introduce a new concept of stability for cross-validation, called the ( β, ϖ)-stability, and use it as a new perspective to build the general theory for cross-validation. The ( β, ϖ)-stability mathematically connects the generalization ability and the stability of the cross-validated model via the Rademacher complexity. Our result reveals mathematically the effect of cross-validation from two sides: on one hand, cross-validation picks the model with the best empirical generalization ability by validating all the alternatives on test sets; on the other hand, cross-validation may compromise the stability of the model selection by causing subsampling error. Moreover, the difference between training and test errors in qth round, sometimes referred to as the generalization error, might be autocorrelated on q. Guided by the ideas above, the ( β, ϖ)-stability help us derivd a new class of Rademacher bounds, referred to as the one-round/convoluted Rademacher bounds, for the stability of cross-validation in both the i.i.d. and non-i.i.d. cases. For both light-tail and heavy-tail losses, the new bounds quantify the stability of the one-round/average test error of the cross-validated model in terms of its one-round/average training error, the sample sizes n, number of folds K, the tail property of the loss (encoded as Orlicz-Ψ_ν norms) and the Rademacher complexity of the model class Λ. The new class of bounds not only quantitatively reveals the stability of the generalization ability of the cross-validated model, it also shows empirically the optimal choice for number of folds K, at which the upper bound of the one-round/average test error is lowest, or, to put it in another way, where the test error is most stable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2016

Finite-sample and asymptotic analysis of generalization ability with an application to penalized regression

In this paper, we study the performance of extremum estimators from the ...
research
10/18/2016

Generalization error minimization: a new approach to model evaluation and selection with an application to penalized regression

We study model evaluation and model selection from the perspective of ge...
research
06/19/2017

An a Priori Exponential Tail Bound for k-Folds Cross-Validation

We consider a priori generalization bounds developed in terms of cross-v...
research
07/30/2020

Rademacher upper bounds for cross-validation errors with an application to the lasso

We establish a general upper bound for K-fold cross-validation (K-CV) er...
research
03/22/2021

A Link between Coding Theory and Cross-Validation with Applications

We study the combinatorics of cross-validation based AUC estimation unde...
research
12/11/2002

Theoretical Analyses of Cross-Validation Error and Voting in Instance-Based Learning

This paper begins with a general theory of error in cross-validation tes...
research
04/08/2023

Block-regularized 5×2 Cross-validated McNemar's Test for Comparing Two Classification Algorithms

In the task of comparing two classification algorithms, the widely-used ...

Please sign up or login with your details

Forgot password? Click here to reset