AdaBatch: Efficient Gradient Aggregation Rules for Sequential and Parallel Stochastic Gradient Methods

11/06/2017
by   Alexandre Défossez, et al.
0

We study a new aggregation operator for gradients coming from a mini-batch for stochastic gradient (SG) methods that allows a significant speed-up in the case of sparse optimization problems. We call this method AdaBatch and it only requires a few lines of code change compared to regular mini-batch SGD algorithms. We provide a theoretical insight to understand how this new class of algorithms is performing and show that it is equivalent to an implicit per-coordinate rescaling of the gradients, similarly to what Adagrad methods can do. In theory and in practice, this new aggregation allows to keep the same sample efficiency of SG methods while increasing the batch size. Experimentally, we also show that in the case of smooth convex optimization, our procedure can even obtain a better loss when increasing the batch size for a fixed number of samples. We then apply this new algorithm to obtain a parallelizable stochastic gradient method that is synchronous but allows speed-up on par with Hogwild! methods as convergence does not deteriorate with the increase of the batch size. The same approach can be used to make mini-batch provably efficient for variance-reduced SG methods such as SVRG.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2018

Inexact SARAH Algorithm for Stochastic Optimization

We develop and analyze a variant of variance reducing stochastic gradien...
research
05/05/2020

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

The choice of hyper-parameters affects the performance of neural models....
research
12/02/2019

Risk Bounds for Low Cost Bipartite Ranking

Bipartite ranking is an important supervised learning problem; however, ...
research
05/19/2017

EE-Grad: Exploration and Exploitation for Cost-Efficient Mini-Batch SGD

We present a generic framework for trading off fidelity and cost in comp...
research
01/26/2023

Learning Large Scale Sparse Models

In this work, we consider learning sparse models in large scale settings...
research
10/22/2020

Sample Efficient Reinforcement Learning with REINFORCE

Policy gradient methods are among the most effective methods for large-s...
research
10/21/2019

Faster Stochastic Algorithms via History-Gradient Aided Batch Size Adaptation

Various schemes for adapting batch size have been recently proposed to a...

Please sign up or login with your details

Forgot password? Click here to reset