Decentralized Stochastic Gradient Tracking for Empirical Risk Minimization

09/06/2019
by   Jiaqi Zhang, et al.
0

Recent works have shown superiorities of decentralized SGD to centralized counterparts in large-scale machine learning, but their theoretical gap is still not fully understood. In this paper, we propose a decentralized stochastic gradient tracking (DSGT) algorithm over peer-to-peer networks for empirical risk minimization problems, and explicitly evaluate its convergence rate in terms of key parameters of the problem, e.g., algebraic connectivity of the communication network, mini-batch size, and gradient variance. Importantly, it is the first theoretical result that can exactly recover the rate of the centralized SGD, and has optimal dependence on the algebraic connectivity of the networks when using stochastic gradients. Moreover, we explicitly quantify how the network affects speedup and the rate improvement over existing works. Interestingly, we also point out for the first time that both linear and sublinear speedup can be possible. We empirically validate DSGT on neural networks and logistic regression problems, and show its advantage over the state-of-the-art algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2019

Decentralized Stochastic Gradient Tracking for Non-convex Empirical Risk Minimization

This paper studies a decentralized stochastic gradient tracking (DSGT) a...
research
07/09/2019

A Stochastic First-Order Method for Ordered Empirical Risk Minimization

We propose a new stochastic first-order method for empirical risk minimi...
research
10/09/2021

An Empirical Study on Compressed Decentralized Stochastic Gradient Algorithms with Overparameterized Models

This paper considers decentralized optimization with application to mach...
research
10/08/2019

Variance-Reduced Decentralized Stochastic Optimization with Gradient Tracking – Part II: GT-SVRG

Decentralized stochastic optimization has recently benefited from gradie...
research
09/07/2023

Convergence Analysis of Decentralized ASGD

Over the last decades, Stochastic Gradient Descent (SGD) has been intens...
research
05/25/2017

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Most distributed machine learning systems nowadays, including TensorFlow...
research
04/23/2023

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Nowadays, algorithms with fast convergence, small memory footprints, and...

Please sign up or login with your details

Forgot password? Click here to reset