Efficient Distributed SGD with Variance Reduction

12/09/2015
by   Soham De, et al.
0

Stochastic Gradient Descent (SGD) has become one of the most popular optimization methods for training machine learning models on massive datasets. However, SGD suffers from two main drawbacks: (i) The noisy gradient updates have high variance, which slows down convergence as the iterates approach the optimum, and (ii) SGD scales poorly in distributed settings, typically experiencing rapidly decreasing marginal benefits as the number of workers increases. In this paper, we propose a highly parallel method, CentralVR, that uses error corrections to reduce the variance of SGD gradient updates, and scales linearly with the number of worker nodes. CentralVR enjoys low iteration complexity, provably linear convergence rates, and exhibits linear performance gains up to hundreds of cores for massive datasets. We compare CentralVR to state-of-the-art parallel stochastic optimization methods on a variety of models and datasets, and find that our proposed methods exhibit stronger scaling than other SGD variants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/05/2015

Variance Reduction for Distributed Stochastic Gradient Descent

Variance reduction (VR) methods boost the performance of stochastic grad...
research
02/16/2018

Distributed Stochastic Optimization via Adaptive Stochastic Gradient Descent

Stochastic convex optimization algorithms are the most popular way to tr...
research
01/27/2019

99

It is well known that many optimization methods, including SGD, SAGA, an...
research
10/23/2020

Linearly Converging Error Compensated SGD

In this paper, we propose a unified analysis of variants of distributed ...
research
12/20/2021

Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps

In this thesis, we propose new theoretical frameworks for the analysis o...
research
02/28/2020

Do optimization methods in deep learning applications matter?

With advances in deep learning, exponential data growth and increasing m...
research
05/29/2019

Accelerated Sparsified SGD with Error Feedback

We study a stochastic gradient method for synchronous distributed optimi...

Please sign up or login with your details

Forgot password? Click here to reset