FetchSGD: Communication-Efficient Federated Learning with Sketching

07/15/2020
by   Daniel Rothchild, et al.
0

Existing approaches to federated learning suffer from a communication bottleneck as well as convergence issues due to sparse client participation. In this paper we introduce a novel algorithm, called FetchSGD, to overcome these challenges. FetchSGD compresses model updates using a Count Sketch, and then takes advantage of the mergeability of sketches to combine model updates from many workers. A key insight in the design of FetchSGD is that, because the Count Sketch is linear, momentum and error accumulation can both be carried out within the sketch. This allows the algorithm to move momentum and error accumulation from clients to the central aggregator, overcoming the challenges of sparse client participation while still achieving high compression rates and good convergence. We prove that FetchSGD has favorable convergence guarantees, and we demonstrate its empirical effectiveness by training two residual networks and a transformer model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2020

Improved Convergence Rates for Non-Convex Federated Learning with Compression

Federated learning is a new distributed learning paradigm that enables e...
research
06/28/2023

Momentum Benefits Non-IID Federated Learning Simply and Provably

Federated learning is a powerful paradigm for large-scale machine learni...
research
09/17/2021

Comfetch: Federated Learning of Large Networks on Memory-Constrained Clients via Sketching

A popular application of federated learning is using many clients to tra...
research
12/24/2021

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Due to the communication bottleneck in distributed and federated learnin...
research
02/08/2021

Double Momentum SGD for Federated Learning

Communication efficiency is crucial in federated learning. Conducting ma...
research
01/07/2022

Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

A significant bottleneck in federated learning is the network communicat...
research
06/15/2023

Private Federated Frequency Estimation: Adapting to the Hardness of the Instance

In federated frequency estimation (FFE), multiple clients work together ...

Please sign up or login with your details

Forgot password? Click here to reset