DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

02/02/2022
by   Alexander Tyurin, et al.
0

We develop and analyze DASHA: a new family of methods for nonconvex distributed optimization problems. When the local functions at the nodes have a finite-sum or an expectation form, our new methods, DASHA-PAGE and DASHA-SYNC-MVR, improve the theoretical oracle and communication complexity of the previous state-of-the-art method MARINA by Gorbunov et al. (2020). In particular, to achieve an epsilon-stationary point, and considering the random sparsifier RandK as an example, our methods compute the optimal number of gradients 𝒪(√(m)/ε√(n)) and 𝒪(σ/ε^3/2n) in finite-sum and expectation form cases, respectively, while maintaining the SOTA communication complexity 𝒪(d/ε√(n)). Furthermore, unlike MARINA, the new methods DASHA, DASHA-PAGE and DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for federated learning. We extend our results to the case when the functions satisfy the Polyak-Lojasiewicz condition. Finally, our theory is corroborated in practice: we see a significant improvement in experiments with nonconvex classification and training of deep learning models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2021

MARINA: Faster Non-Convex Distributed Learning with Compression

We develop and analyze MARINA: a new communication efficient method for ...
research
05/31/2022

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

We present a new method that includes three key components of distribute...
research
03/02/2021

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

We propose ZeroSARAH – a novel variant of the variance-reduced method SA...
research
02/17/2020

Stochastic Gauss-Newton Algorithms for Nonconvex Compositional Optimization

We develop two new stochastic Gauss-Newton algorithms for solving a clas...
research
10/04/2021

DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization

Emerging applications in multi-agent environments such as internet-of-th...
research
08/25/2020

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

In this paper, we propose a novel stochastic gradient estimator—ProbAbil...
research
02/02/2022

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

We propose and study a new class of gradient communication mechanisms fo...

Please sign up or login with your details

Forgot password? Click here to reset