3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

02/02/2022
by   Peter Richtarik, et al.
2

We propose and study a new class of gradient communication mechanisms for communication-efficient training – three point compressors (3PC) – as well as efficient distributed nonconvex optimization algorithms that can take advantage of them. Unlike most established approaches, which rely on a static compressor choice (e.g., Top-K), our class allows the compressors to evolve throughout the training process, with the aim of improving the theoretical communication complexity and practical efficiency of the underlying methods. We show that our general approach can recover the recently proposed state-of-the-art error feedback mechanism EF21 (Richtárik et al., 2021) and its theoretical properties as a special case, but also leads to a number of new efficient methods. Notably, our approach allows us to improve upon the state of the art in the algorithmic and theoretical foundations of the lazy aggregation literature (Chen et al., 2018). As a by-product that may be of independent interest, we provide a new and fundamental link between the lazy aggregation and error feedback literature. A special feature of our work is that we do not require the compressors to be unbiased.

READ FULL TEXT
research
06/09/2021

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Error feedback (EF), also known as error compensation, is an immensely p...
research
10/07/2021

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

We study the MARINA method of Gorbunov et al (2021) – the current state-...
research
10/31/2022

Adaptive Compression for Communication-Efficient Distributed Training

We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optim...
research
06/07/2022

Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation

Despite their high computation and communication costs, Newton-type meth...
research
12/29/2022

Can 5th Generation Local Training Methods Support Client Sampling? Yes!

The celebrated FedAvg algorithm of McMahan et al. (2017) is based on thr...
research
05/24/2023

Momentum Provably Improves Error Feedback!

Due to the high communication overhead when training machine learning mo...
research
02/02/2022

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

We develop and analyze DASHA: a new family of methods for nonconvex dist...

Please sign up or login with your details

Forgot password? Click here to reset