Optimal Mini-Batch Size Selection for Fast Gradient Descent

11/15/2019
by   Michael P. Perrone, et al.
0

This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems. By decoupling algorithmic analysis issues from hardware and software implementation details, we reveal a robust empirical inverse law between mini-batch size and the average number of SGD updates required to converge to a specified error threshold. Combining this empirical inverse law with measured system performance, we create an accurate, closed-form model of average training time and show how this model can be used to identify quantifiable implications for both algorithmic and hardware aspects of machine learning. We demonstrate the inverse law empirically, on both image recognition (MNIST, CIFAR10 and CIFAR100) and machine translation (Europarl) tasks, and provide a theoretic justification via proving a novel bound on mini-batch SGD training.

READ FULL TEXT
research
06/14/2022

MBGDT:Robust Mini-Batch Gradient Descent

In high dimensions, most machine learning method perform fragile even th...
research
08/27/2018

Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

In order to extract the best possible performance from asynchronous stoc...
research
08/13/2020

Deep Networks with Fast Retraining

Recent wor [1] has utilized Moore-Penrose (MP) inverse in deep convoluti...
research
11/17/2017

A Resizable Mini-batch Gradient Descent based on a Randomized Weighted Majority

Determining the appropriate batch size for mini-batch gradient descent i...
research
08/09/2021

On the Power of Differentiable Learning versus PAC and SQ Learning

We study the power of learning via mini-batch stochastic gradient descen...
research
04/24/2017

Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples

Self-paced learning and hard example mining re-weight training instances...
research
02/23/2020

Improve SGD Training via Aligning Mini-batches

Deep neural networks (DNNs) for supervised learning can be viewed as a p...

Please sign up or login with your details

Forgot password? Click here to reset