Communication-Censored Distributed Stochastic Gradient Descent

09/09/2019
by   Weiyu Li, et al.
0

This paper develops a communication-efficient algorithm to solve the stochastic optimization problem defined over a distributed network, aiming at reducing the burdensome communication in applications such as distributed machine learning. Different from the existing works based on quantization and sparsification, we introduce a communication-censoring technique to reduce the transmissions of variables, which leads to our communication-Censored distributed Stochastic Gradient Descent (CSGD) algorithm. Specifically, in CSGD, the latest mini-batch stochastic gradient at a worker will be transmitted to the server only if it is sufficiently informative. When the latest gradient is not available, the stale one will be reused at the server. To implement this communication-censoring strategy, the batch sizes are increasing in order to alleviate the effect of gradient noise. Theoretically, CSGD enjoys the same order of convergence rate as that of SGD, but effectively reduces communication. Numerical experiments further demonstrate the sizable communication saving of CSGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

CADA: Communication-Adaptive Distributed Adam

Stochastic gradient descent (SGD) has taken the stage as the primary wor...
research
03/11/2019

Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling

Machine learning, especially deep neural networks, has been rapidly deve...
research
04/17/2017

Sparse Communication for Distributed Gradient Descent

We make distributed stochastic gradient descent faster by exchanging spa...
research
01/19/2019

Fitting ReLUs via SGD and Quantized SGD

In this paper we focus on the problem of finding the optimal weights of ...
research
10/08/2018

Toward Understanding the Impact of Staleness in Distributed Machine Learning

Many distributed machine learning (ML) systems adopt the non-synchronous...
research
04/21/2016

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in l...
research
09/20/2018

Sparsified SGD with Memory

Huge scale machine learning problems are nowadays tackled by distributed...

Please sign up or login with your details

Forgot password? Click here to reset