Deep Bilevel Learning

09/05/2018
by   Simon Jenni, et al.
0

We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. Our approach is based on the principles of cross-validation, where a validation set is used to limit the model overfitting. We formulate such principles as a bilevel optimization problem. This formulation allows us to define the optimization of a cost on the validation set subject to another optimization on the training set. The overfitting is controlled by introducing weights on each mini-batch in the training set and by choosing their values so that they minimize the error on the validation set. In practice, these weights define mini-batch learning rates in a gradient descent update equation that favor gradients with better generalization capabilities. Because of its simplicity, this approach can be integrated with other regularization methods and training schemes. We evaluate extensively our proposed algorithm on several neural network architectures and datasets, and find that it consistently improves the generalization of the model, especially when labels are noisy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2019

Submodular Batch Selection for Training Deep Neural Networks

Mini-batch gradient descent based methods are the de facto algorithms fo...
research
07/14/2021

Disparity Between Batches as a Signal for Early Stopping

We propose a metric for evaluating the generalization ability of deep ne...
research
10/17/2017

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

This paper tackles two related questions at the heart of machine learnin...
research
06/08/2021

Stability and Generalization of Bilevel Programming in Hyperparameter Optimization

Recently, the (gradient-based) bilevel programming framework is widely u...
research
03/24/2018

Learning to Reweight Examples for Robust Deep Learning

Deep neural networks have been shown to be very powerful modeling tools ...
research
05/28/2019

The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial

In this tutorial paper, we first define mean squared error, variance, co...
research
06/19/2023

Understanding Generalization in the Interpolation Regime using the Rate Function

In this paper, we present a novel characterization of the smoothness of ...

Please sign up or login with your details

Forgot password? Click here to reset