Augment your batch: better training with larger batches

01/27/2019
by   Elad Hoffer, et al.
0

Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tuning hyperparameter schedules, the generalization of the model may be hampered. We propose to use batch augmentation: replicating instances of samples within the same batch with different data augmentations. Batch augmentation acts as a regularizer and an accelerator, increasing both generalization and performance scaling. We analyze the effect of batch augmentation on gradient variance and show that it empirically improves convergence for a wide variety of deep neural networks and datasets. Our results show that batch augmentation reduces the number of necessary SGD updates to achieve the same accuracy as the state-of-the-art. Overall, this simple yet effective method enables faster training and better generalization by allowing more computational resources to be used concurrently.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2015

Parallel Dither and Dropout for Regularising Deep Neural Networks

Effective regularisation during training can mean the difference between...
research
06/27/2020

Stochastic Batch Augmentation with An Effective Distilled Dynamic Soft Label Regularizer

Data augmentation have been intensively used in training deep neural net...
research
05/27/2021

Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error

In computer vision, it is standard practice to draw a single sample from...
research
06/21/2022

On the Maximum Hessian Eigenvalue and Generalization

The mechanisms by which certain training interventions, such as increasi...
research
08/13/2020

Deep Networks with Fast Retraining

Recent wor [1] has utilized Moore-Penrose (MP) inverse in deep convoluti...
research
06/10/2020

Extrapolation for Large-batch Training in Deep Learning

Deep learning networks are typically trained by Stochastic Gradient Desc...
research
06/11/2018

The Effect of Network Width on the Performance of Large-batch Training

Distributed implementations of mini-batch stochastic gradient descent (S...

Please sign up or login with your details

Forgot password? Click here to reset