Improving the convergence of SGD through adaptive batch sizes

10/18/2019
by   Scott Sievert, et al.
0

Mini-batch stochastic gradient descent (SGD) approximates the gradient of an objective function with the average gradient of some batch of constant size. While small batch sizes can yield high-variance gradient estimates that prevent the model from learning a good model, large batches may require more data and computational effort. This work presents a method to change the batch size adaptively with model quality. We show that our method requires the same number of model updates as full-batch gradient descent while requiring the same total number of gradient computations as SGD. While this method requires evaluating the objective function, we present a passive approximation that eliminates this constraint and improves computational efficiency. We provide extensive experiments illustrating that our methods require far fewer model updates without increasing the total amount of computation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2017

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

Stochastic Gradient Descent (SGD) with small mini-batch is a key compone...
research
10/18/2016

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Classical stochastic gradient methods for optimization rely on noisy gra...
research
07/09/2020

AdaScale SGD: A User-Friendly Algorithm for Distributed Training

When using large-batch training to speed up stochastic gradient descent,...
research
10/12/2016

Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging

This work characterizes the benefits of averaging techniques widely used...
research
06/11/2018

The Effect of Network Width on the Performance of Large-batch Training

Distributed implementations of mini-batch stochastic gradient descent (S...
research
12/09/2017

Cost-Sensitive Approach to Batch Size Adaptation for Gradient Descent

In this paper, we propose a novel approach to automatically determine th...
research
04/24/2017

Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Self-paced learning and hard example mining re-weight training instances...

Please sign up or login with your details

Forgot password? Click here to reset