M22: A Communication-Efficient Algorithm for Federated Learning Inspired by Rate-Distortion

01/23/2023
by   Yangyi Liu, et al.
0

In federated learning (FL), the communication constraint between the remote learners and the Parameter Server (PS) is a crucial bottleneck. For this reason, model updates must be compressed so as to minimize the loss in accuracy resulting from the communication constraint. This paper proposes “M-magnitude weighted L_ 2 distortion + 2 degrees of freedom” (M22) algorithm, a rate-distortion inspired approach to gradient compression for federated training of deep neural networks (DNNs). In particular, we propose a family of distortion measures between the original gradient and the reconstruction we referred to as “M-magnitude weighted L_2” distortion, and we assume that gradient updates follow an i.i.d. distribution – generalized normal or Weibull, which have two degrees of freedom. In both the distortion measure and the gradient, there is one free parameter for each that can be fitted as a function of the iteration number. Given a choice of gradient distribution and distortion measure, we design the quantizer minimizing the expected distortion in gradient reconstruction. To measure the gradient compression performance under a communication constraint, we define the per-bit accuracy as the optimal improvement in accuracy that one bit of communication brings to the centralized model over the training period. Using this performance measure, we systematically benchmark the choice of gradient distribution and distortion measure. We provide substantial insights on the role of these choices and argue that significant performance improvements can be attained using such a rate-distortion inspired compressor.

READ FULL TEXT

page 12

page 15

research
02/06/2022

Lossy Gradient Compression: How Much Accuracy Can One Bit Buy?

In federated learning (FL), a global model is trained at a Parameter Ser...
research
01/07/2022

Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

A significant bottleneck in federated learning is the network communicat...
research
11/30/2021

Communication-Efficient Federated Learning via Quantized Compressed Sensing

In this paper, we present a communication-efficient federated learning f...
research
11/15/2021

DNN gradient lossless compression: Can GenNorm be the answer?

In this paper, the problem of optimal gradient lossless compression in D...
research
01/05/2022

A Theoretically Novel Trade-off for Sparse Secret-key Generation

We in this paper theoretically go over a rate-distortion based sparse di...
research
02/21/2008

Self Organizing Map algorithm and distortion measure

We study the statistical meaning of the minimization of distortion measu...
research
06/28/2022

Fundamental Limits of Communication Efficiency for Model Aggregation in Distributed Learning: A Rate-Distortion Approach

One of the main focuses in distributed learning is communication efficie...

Please sign up or login with your details

Forgot password? Click here to reset