Asymptotics of Cross-Validation

01/29/2020
by   Morgane Austern, et al.
0

Cross validation is a central tool in evaluating the performance of machine learning and statistical models. However, despite its ubiquitous role, its theoretical properties are still not well understood. We study the asymptotic properties of the cross validated-risk for a large class of models. Under stability conditions, we establish a central limit theorem and Berry-Esseen bounds, which enable us to compute asymptotically accurate confidence intervals. Using our results, we paint a big picture for the statistical speed-up of cross validation compared to a train-test split procedure. A corollary of our results is that parametric M-estimators (or empirical risk minimizers) benefit from the "full" speed-up when performing cross-validation under the training loss. In other common cases, such as when the training is done using a surrogate loss or a regularizer, we show that the behavior of the cross-validated risk is complex with a variance reduction which may be smaller or larger than the "full" speed-up, depending on the model and the underlying distribution. We allow the number of folds to grow with the number of observations at any rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Cross-validation Confidence Intervals for Test Error

This work develops central limit theorems for cross-validation and consi...
research
11/09/2022

On High-Dimensional Gaussian Comparisons For Cross-Validation

We derive high-dimensional Gaussian comparison results for the standard ...
research
11/14/2017

On Optimal Generalizability in Parametric Learning

We consider the parametric learning problem, where the objective of the ...
research
05/25/2022

Mitigating multiple descents: A model-agnostic framework for risk monotonization

Recent empirical and theoretical analyses of several commonly used predi...
research
07/18/2018

Dependency Leakage: Analysis and Scalable Estimators

In this paper, we prove the first theoretical results on dependency leak...
research
11/18/2022

Prediction scoring of data-driven discoveries for reproducible research

Predictive modeling uncovers knowledge and insights regarding a hypothes...
research
01/12/2023

Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

We show, to our knowledge, the first theoretical treatments of two commo...

Please sign up or login with your details

Forgot password? Click here to reset