A Non-Asymptotic Analysis of Network Independence for Distributed Stochastic Gradient Descent

06/06/2019
by   Alex Olshevsky, et al.
0

This paper is concerned with minimizing the average of n cost functions over a network, in which agents may communicate and exchange information with their peers in the network. Specifically, we consider the setting where only noisy gradient information is available. To solve the problem, we study the standard distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, we not only show that DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD), but also explicitly identify the non-asymptotic convergence rate as a function of characteristics of the objective functions and the network. Furthermore, we derive the time needed for DSGD to approach the asymptotic convergence rate, which behaves as K_T=O(n^16/15/(1-ρ_w)^31/15), where (1-ρ_w) denotes the spectral gap of the mixing matrix of communicating agents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2021

Improving the Transient Times for Distributed Stochastic Gradient Methods

We consider the distributed optimization problem where n agents each pos...
research
10/11/2019

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

We investigate the theoretical limits of pipeline parallel learning of d...
research
02/06/2020

Achieving the fundamental convergence-communication tradeoff with Differentially Quantized Gradient Descent

The problem of reducing the communication cost in distributed training t...
research
06/11/2018

Swarming for Faster Convergence in Stochastic Optimization

We study a distributed framework for stochastic optimization which is in...
research
08/30/2016

Data Dependent Convergence for Distributed Stochastic Optimization

In this dissertation we propose alternative analysis of distributed stoc...
research
01/02/2020

Stochastic Gradient Langevin Dynamics on a Distributed Network

Langevin MCMC gradient optimization is a class of increasingly popular m...
research
08/28/2022

Asynchronous Training Schemes in Distributed Learning with Time Delay

In the context of distributed deep learning, the issue of stale weights ...

Please sign up or login with your details

Forgot password? Click here to reset