The Number of Steps Needed for Nonconvex Optimization of a Deep Learning Optimizer is a Rational Function of Batch Size

08/26/2021
by   Hideaki Iiduka, et al.
0

Recently, convergence as well as convergence rate analyses of deep learning optimizers for nonconvex optimization have been widely studied. Meanwhile, numerical evaluations for the optimizers have precisely clarified the relationship between batch size and the number of steps needed for training deep neural networks. The main contribution of this paper is to show theoretically that the number of steps needed for nonconvex optimization of each of the optimizers can be expressed as a rational function of batch size. Having these rational functions leads to two particularly important facts, which were validated numerically in previous studies. The first fact is that there exists an optimal batch size such that the number of steps needed for nonconvex optimization is minimized. This implies that using larger batch sizes than the optimal batch size does not decrease the number of steps needed for nonconvex optimization. The second fact is that the optimal batch size depends on the optimizer. In particular, it is shown theoretically that momentum and Adam-type optimizers can exploit larger optimal batches and further reduce the minimum number of steps needed for nonconvex optimization than can the stochastic gradient descent optimizer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2021

Minimization of Stochastic First-order Oracle Complexity of Adaptive Methods for Nonconvex Optimization

Numerical evaluations have definitively shown that, for deep learning op...
research
05/21/2018

Stochastic Gradient Descent for Stochastic Doubly-Nonconvex Composite Optimization

The stochastic gradient descent has been widely used for solving composi...
research
05/13/2019

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

A number of optimization approaches have been proposed for optimizing no...
research
03/18/2020

Block Layer Decomposition schemes for training Deep Neural Networks

Deep Feedforward Neural Networks' (DFNNs) weights estimation relies on t...
research
12/08/2017

Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks

Progress in deep learning is slowed by the days or weeks it takes to tra...
research
02/12/2021

A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

Recently the LARS and LAMB optimizers have been proposed for training ne...

Please sign up or login with your details

Forgot password? Click here to reset