Log In Sign Up

Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD

by   Kun Yuan, et al.

We consider decentralized stochastic optimization problems where a network of agents each owns a local cost function cooperate to find a minimizer of the global-averaged cost. A widely studied decentralized algorithm for this problem is D-SGD in which each node applies a stochastic gradient descent step, then averages its estimate with its neighbors. D-SGD is attractive due to its efficient single-iteration communication and can achieve linear speedup in convergence (in terms of the network size). However, D-SGD is very sensitive to the network topology. For smooth objective functions, the transient stage (which measures how fast the algorithm can reach the linear speedup stage) of D-SGD is on the order of O(n/(1-β)^2) and O(n^3/(1-β)^4) for strongly convex and generally convex cost functions, respectively, where 1-β∈ (0,1) is a topology-dependent quantity that approaches 0 for a large and sparse network. Hence, D-SGD suffers from slow convergence for large and sparse networks. In this work, we study the non-asymptotic convergence property of the D^2/Exact-diffusion algorithm. By eliminating the influence of data heterogeneity between nodes, D^2/Exact-diffusion is shown to have an enhanced transient stage that are on the order of O(n/(1-β)) and O(n^3/(1-β)^2) for strongly convex and generally convex cost functions, respectively. Moreover, we provide a lower bound of the transient stage of D-SGD under homogeneous data distributions, which coincides with the transient stage of D^2/Exact-diffusion in the strongly-convex setting. These results show that removing the influence of data heterogeneity can ameliorate the network topology dependence of D-SGD. Compared with existing decentralized algorithms bounds, D^2/Exact-diffusion is least sensitive to network topology.


page 1

page 2

page 3

page 4


Improving the Transient Times for Distributed Stochastic Gradient Methods

We consider the distributed optimization problem where n agents each pos...

A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning

We study the consensus decentralized optimization problem where the obje...

Accelerating Gossip SGD with Periodic Global Averaging

Communication overhead hinders the scalability of large-scale distribute...

Acceleration of stochastic methods on the example of decentralized SGD

In this paper, we present an algorithm for accelerating decentralized st...

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

We develop a general framework unifying several gradient-based stochasti...

Accelerated SGD for Non-Strongly-Convex Least Squares

We consider stochastic approximation for the least squares regression pr...