Can 5th Generation Local Training Methods Support Client Sampling? Yes!

12/29/2022
by   Michał Grudzień, et al.
0

The celebrated FedAvg algorithm of McMahan et al. (2017) is based on three components: client sampling (CS), data sampling (DS) and local training (LT). While the first two are reasonably well understood, the third component, whose role is to reduce the number of communication rounds needed to train the model, resisted all attempts at a satisfactory theoretical explanation. Malinovsky et al. (2022) identified four distinct generations of LT methods based on the quality of the provided theoretical communication complexity guarantees. Despite a lot of progress in this area, none of the existing works were able to show that it is theoretically better to employ multiple local gradient-type steps (i.e., to engage in LT) than to rely on a single local gradient-type step only in the important heterogeneous data regime. In a recent breakthrough embodied in their ProxSkip method and its theoretical analysis, Mishchenko et al. (2022) showed that LT indeed leads to provable communication acceleration for arbitrarily heterogeneous data, thus jump-starting the 5^ th generation of LT methods. However, while these latest generation LT methods are compatible with DS, none of them support CS. We resolve this open problem in the affirmative. In order to do so, we had to base our algorithmic development on new algorithmic and theoretical foundations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

Improving Accelerated Federated Learning with Compression and Importance Sampling

Federated Learning is a collaborative training framework that leverages ...
research
07/08/2022

Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox

Inspired by a recent breakthrough of Mishchenko et al (2022), who for th...
research
10/27/2019

Improved Zeroth-Order Variance Reduced Algorithms and Analysis for Nonconvex Optimization

Two types of zeroth-order stochastic algorithms have recently been desig...
research
02/02/2022

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

We propose and study a new class of gradient communication mechanisms fo...
research
02/08/2023

Federated Minimax Optimization with Client Heterogeneity

Minimax optimization has seen a surge in interest with the advent of mod...
research
02/18/2022

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

We introduce ProxSkip – a surprisingly simple and provably efficient met...

Please sign up or login with your details

Forgot password? Click here to reset