DeepAI AI Chat
Log In Sign Up

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

by   Anastasia Koloskova, et al.

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).


page 1

page 2

page 3

page 4


Communication Efficient Decentralized Training with Multiple Local Updates

Communication efficiency plays a significant role in decentralized optim...

Acceleration of stochastic methods on the example of decentralized SGD

In this paper, we present an algorithm for accelerating decentralized st...

Distributed and Stochastic Optimization Methods with Gradient Compression and Local Steps

In this thesis, we propose new theoretical frameworks for the analysis o...

Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology

We develop a general framework unifying several gradient-based stochasti...

Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD

Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to ...

Local SGD: Unified Theory and New Efficient Methods

We present a unified framework for analyzing local SGD methods in the co...

Multi-Level Local SGD for Heterogeneous Hierarchical Networks

We propose Multi-Level Local SGD, a distributed gradient method for lear...