Faster Stochastic Algorithms via History-Gradient Aided Batch Size Adaptation

10/21/2019
by   Kaiyi Ji, et al.
22

Various schemes for adapting batch size have been recently proposed to accelerate stochastic algorithms. However, existing schemes either apply prescribed batch size adaption or require additional backtracking and condition verification steps to exploit the information along optimization path. In this paper, we propose an easy-to-implement scheme for adapting batch size by exploiting history stochastic gradients, based on which we propose the Adaptive batch size SGD (AbaSGD), AbaSVRG, and AbaSPIDER algorithms. To handle the dependence of the batch size on history stochastic gradients, we develop a new convergence analysis technique, and show that these algorithms achieve improved overall complexity over their vanilla counterparts. Moreover, their convergence rates are adaptive to the optimization landscape that the iterate experiences. Extensive experiments demonstrate that our algorithms substantially outperform existing competitive algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2016

Big Batch SGD: Automated Inference using Adaptive Batch Sizes

Classical stochastic gradient methods for optimization rely on noisy gra...
research
08/29/2023

ABS-SGD: A Delayed Synchronous Stochastic Gradient Descent Algorithm with Adaptive Batch Size for Heterogeneous GPU Clusters

As the size of models and datasets grows, it has become increasingly com...
research
05/30/2023

BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

The popularity of bi-level optimization (BO) in deep learning has spurre...
research
11/06/2017

AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

We study a new aggregation operator for gradients coming from a mini-bat...
research
11/04/2022

How Does Adaptive Optimization Impact Local Neural Network Geometry?

Adaptive optimization methods are well known to achieve superior converg...
research
05/17/2022

Hyper-Learning for Gradient-Based Batch Size Adaptation

Scheduling the batch size to increase is an effective strategy to contro...
research
10/04/2020

Feature Whitening via Gradient Transformation for Improved Convergence

Feature whitening is a known technique for speeding up training of DNN. ...

Please sign up or login with your details

Forgot password? Click here to reset