Adaptive Learning of the Optimal Mini-Batch Size of SGD

05/03/2020
by   Motasem Alfarra, et al.
22

Recent advances in the theoretical understandingof SGD (Qian et al., 2019) led to a formula for the optimal mini-batch size minimizing the number of effective data passes, i.e., the number of iterations times the mini-batch size. However, this formula is of no practical value as it depends on the knowledge of the variance of the stochastic gradients evaluated at the optimum. In this paper we design a practical SGD method capable of learning the optimal mini-batch size adaptively throughout its iterations. Our method does this provably, and in our experiments with synthetic and real data robustly exhibits nearly optimal behaviour; that is, it works as if the optimal mini-batch size was known a-priori. Further, we generalize our method to several new mini-batch strategies not considered in the literature before, including a sampling suitable for distributed implementations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2019

SGD: General Analysis and Improved Rates

We propose a general yet simple theorem describing the convergence of SG...
research
05/19/2017

EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD

We present a generic framework for trading off fidelity and cost in comp...
research
03/12/2018

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

Stochastic neural net weights are used in a variety of contexts, includi...
research
12/18/2017

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning

Stochastic Gradient Descent (SGD) with small mini-batch is a key compone...
research
05/01/2017

Determinantal Point Processes for Mini-Batch Diversification

We study a mini-batch diversification scheme for stochastic gradient des...
research
06/22/2022

A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta

Mini-batch SGD with momentum is a fundamental algorithm for learning lar...
research
05/05/2020

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

The choice of hyper-parameters affects the performance of neural models....

Please sign up or login with your details

Forgot password? Click here to reset