Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach

06/28/2022
by   Naifu Zhang, et al.
8

One of the main focuses in distributed learning is communication efficiency, since model aggregation at each round of training can consist of millions to billions of parameters. Several model compression methods, such as gradient quantization and sparsification, have been proposed to improve the communication efficiency of model aggregation. However, the information-theoretic minimum communication cost for a given distortion of gradient estimators is still unknown. In this paper, we study the fundamental limit of communication cost of model aggregation in distributed learning from a rate-distortion perspective. By formulating the model aggregation as a vector Gaussian CEO problem, we derive the rate region bound and sum-rate-distortion function for the model aggregation problem, which reveals the minimum communication rate at a particular gradient distortion upper bound. We also analyze the communication cost at each iteration and total communication cost based on the sum-rate-distortion function with the gradient statistics of real-world datasets. It is found that the communication gain by exploiting the correlation between worker nodes is significant for SignSGD, and a high distortion of gradient estimator can achieve low total communication cost in gradient compression.

READ FULL TEXT

page 5

page 6

page 7

page 10

page 17

page 18

page 20

page 22

research
08/23/2021

Rate distortion comparison of a few gradient quantizers

This article is in the context of gradient compression. Gradient compres...
research
10/09/2018

Rate Distortion For Model Compression: From Theory To Practice

As the size of neural network models increases dramatically today, study...
research
07/16/2023

Optimal Compression of Unit Norm Vectors in the High Distortion Regime

Motivated by the need for communication-efficient distributed learning, ...
research
01/07/2022

Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

A significant bottleneck in federated learning is the network communicat...
research
01/21/2021

Rate Region for Indirect Multiterminal Source Coding in Federated Learning

One of the main focus in federated learning (FL) is the communication ef...
research
01/23/2023

M22: A Communication-Efficient Algorithm for Federated Learning Inspired by Rate-Distortion

In federated learning (FL), the communication constraint between the rem...
research
11/23/2021

Towards Empirical Sandwich Bounds on the Rate-Distortion Function

Rate-distortion (R-D) function, a key quantity in information theory, ch...

Please sign up or login with your details

Forgot password? Click here to reset