BEER: Fast O(1/T) Rate for Decentralized Nonconvex Optimization with Communication Compression

01/31/2022
by   Haoyu Zhao, et al.
12

Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments. To tackle the communication bottleneck, there have been many efforts to design communication-compressed algorithms for decentralized nonconvex optimization, where the clients are only allowed to communicate a small amount of quantized information (aka bits) with their neighbors over a predefined graph topology. Despite significant efforts, the state-of-the-art algorithm in the nonconvex setting still suffers from a slower rate of convergence O((G/T)^2/3) compared with their uncompressed counterpart, where G measures the data heterogeneity across different clients, and T is the number of communication rounds. This paper proposes BEER, which adopts communication compression with gradient tracking, and shows it converges at a faster rate of O(1/T). This significantly improves over the state-of-the-art rate, by matching the rate without compression even under arbitrary data heterogeneity. Numerical experiments are also provided to corroborate our theory and confirm the practical superiority of BEER in the data heterogeneous regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2021

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Due to the communication bottleneck in distributed and federated learnin...
research
05/17/2023

Convergence and Privacy of Decentralized Nonconvex Optimization with Gradient Clipping and Communication Compression

Achieving communication efficiency in decentralized machine learning has...
research
02/01/2019

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

We consider decentralized stochastic optimization with the objective fun...
research
07/01/2020

Linear Convergent Decentralized Optimization with Compression

Communication compression has been extensively adopted to speed up large...
research
11/03/2020

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Decentralized optimization methods enable on-device training of machine ...
research
08/04/2020

PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning

Lossy gradient compression has become a practical tool to overcome the c...
research
12/21/2022

Personalized Decentralized Multi-Task Learning Over Dynamic Communication Graphs

Decentralized and federated learning algorithms face data heterogeneity ...

Please sign up or login with your details

Forgot password? Click here to reset