DeepAI AI Chat
Log In Sign Up

Train on Validation: Squeezing the Data Lemon

02/16/2018
by   Guy Tennenholtz, et al.
0

Model selection on validation data is an essential step in machine learning. While the mixing of data between training and validation is considered taboo, practitioners often violate it to increase performance. Here, we offer a simple, practical method for using the validation set for training, which allows for a continuous, controlled trade-off between performance and overfitting of model selection. We define the notion of on-average-validation-stable algorithms as one in which using small portions of validation data for training does not overfit the model selection process. We then prove that stable algorithms are also validation stable. Finally, we demonstrate our method on the MNIST and CIFAR-10 datasets using stable algorithms as well as state-of-the-art neural networks. Our results show significant increase in test performance with a minor trade-off in bias admitted to the model selection process.

READ FULL TEXT

page 6

page 7

page 8

page 17

09/12/2009

A Nonconformity Approach to Model Selection for SVMs

We investigate the issue of model selection and the use of the nonconfor...
04/30/2015

Model Selection and Overfitting in Genetic Programming: Empirical Study [Extended Version]

Genetic Programming has been very successful in solving a large area of ...
05/24/2019

Perturbed Model Validation: A New Framework to Validate Model Relevance

This paper introduces PMV (Perturbed Model Validation), a new technique ...
12/08/2020

Robustness of Accuracy Metric and its Inspirations in Learning with Noisy Labels

For multi-class classification under class-conditional label noise, we p...
04/01/2020

Evaluation of Model Selection for Kernel Fragment Recognition in Corn Silage

Model selection when designing deep learning systems for specific use-ca...
07/23/2021

Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings

Reinforcement learning (RL) can be used to learn treatment policies and ...
10/20/2020

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset...