Fast and Informative Model Selection using Learning Curve Cross-Validation

11/27/2021
by   Felix Mohr, et al.
0

Common cross-validation (CV) methods like k-fold cross-validation or Monte-Carlo cross-validation estimate the predictive performance of a learner by repeatedly training it on a large portion of the given data and testing on the remaining data. These techniques have two major drawbacks. First, they can be unnecessarily slow on large datasets. Second, beyond an estimation of the final performance, they give almost no insights into the learning process of the validated algorithm. In this paper, we present a new approach for validation based on learning curves (LCCV). Instead of creating train-test splits with a large portion of training data, LCCV iteratively increases the number of instances used for training. In the context of model selection, it discards models that are very unlikely to become competitive. We run a large scale experiment on the 67 datasets from the AutoML benchmark and empirically show that in over 90 most 1.5 runtime reductions of over 20 insights, which for example allow assessing the benefits of acquiring more data. These results are orthogonal to other advances in the field of AutoML.

READ FULL TEXT
research
09/25/2019

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the a...
research
12/24/2020

Leave Zero Out: Towards a No-Cross-Validation Approach for Model Selection

As the main workhorse for model selection, Cross Validation (CV) has ach...
research
09/27/2019

Bootstrap Cross-validation Improves Model Selection in Pharmacometrics

Cross-validation assesses the predictive ability of a model, allowing on...
research
01/25/2019

Rescaling and other forms of unsupervised preprocessing introduce bias into cross-validation

Cross-validation of predictive models is the de-facto standard for model...
research
12/11/2002

Theoretical Analyses of Cross-Validation Error and Voting in Instance-Based Learning

This paper begins with a general theory of error in cross-validation tes...
research
06/07/2018

Writing Style Invariant Deep Learning Model for Historical Manuscripts Alignment

Historical manuscript alignment is a widely known problem in document an...

Please sign up or login with your details

Forgot password? Click here to reset