Consistent Cross Validation with stable learners
This paper investigates the efficiency of different cross-validation (CV) procedures under algorithmic stability with a specific focus on the K-fold. We derive a generic upper bound for the risk estimation error applicable to a wide class of CV schemes. This upper bound ensures the consistency of the leave-one-out and the leave-p-out CV but fails to control the error of the K-fold. We confirm this negative result with a lower bound on the K-fold error which does not converge to zero with the sample size. We thus propose a debiased version of the K-fold which is consistent for any uniformly stable learner. We apply our results to the problem of model selection and demonstrate empirically the usefulness of the promoted approach on real world datasets.
READ FULL TEXT