Disparity Between Batches as a Signal for Early Stopping

07/14/2021
by   Mahsa Forouzesh, et al.
0

We propose a metric for evaluating the generalization ability of deep neural networks trained with mini-batch gradient descent. Our metric, called gradient disparity, is the ℓ_2 norm distance between the gradient vectors of two mini-batches drawn from the training set. It is derived from a probabilistic upper bound on the difference between the classification errors over a given mini-batch, when the network is trained on this mini-batch and when the network is trained on another mini-batch of points sampled from the same dataset. We empirically show that gradient disparity is a very promising early-stopping criterion (i) when data is limited, as it uses all the samples for training and (ii) when available data has noisy labels, as it signals overfitting better than the validation data. Furthermore, we show in a wide range of experimental settings that gradient disparity is strongly related to the generalization error between the training and test sets, and that it is also very informative about the level of label noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2018

Deep Bilevel Learning

We present a novel regularization approach to train neural networks that...
research
10/25/2021

Some like it tough: Improving model generalization via progressively increasing the training difficulty

In this work, we propose to progressively increase the training difficul...
research
04/07/2023

Can we learn better with hard samples?

In deep learning, mini-batch training is commonly used to optimize netwo...
research
08/19/2022

Intersection of Parallels as an Early Stopping Criterion

A common way to avoid overfitting in supervised learning is early stoppi...
research
11/17/2017

A Resizable Mini-batch Gradient Descent based on a Randomized Weighted Majority

Determining the appropriate batch size for mini-batch gradient descent i...
research
02/23/2020

Improve SGD Training via Aligning Mini-batches

Deep neural networks (DNNs) for supervised learning can be viewed as a p...
research
06/08/2022

On gradient descent training under data augmentation with on-line noisy copies

In machine learning, data augmentation (DA) is a technique for improving...

Please sign up or login with your details

Forgot password? Click here to reset