GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

10/28/2022
by   Artavazd Maranjyan, et al.
0

In this work, we study distributed optimization algorithms that reduce the high communication costs of synchronization by allowing clients to perform multiple local gradient steps in each communication round. Recently, Mishchenko et al. (2022) proposed a new type of local method, called ProxSkip, that enjoys an accelerated communication complexity without any data similarity condition. However, their method requires all clients to call local gradient oracles with the same frequency. Because of statistical heterogeneity, we argue that clients with well-conditioned local problems should compute their local gradients less frequently than clients with ill-conditioned local problems. Our first contribution is the extension of the original ProxSkip method to the setup where clients are allowed to perform a different number of local gradient steps in each communication round. We prove that our modified method, GradSkip, still converges linearly, has the same accelerated communication complexity, and the required frequency for local gradient computations is proportional to the local condition number. Next, we generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and considering a generic proximable regularizer. This generalization, GradSkip+, recovers several related methods in the literature. Finally, we present an empirical study to confirm our theoretical claims.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2023

TAMUNA: Accelerated Federated Learning with Local Training and Partial Participation

In federated learning, a large number of users are involved in a global ...
research
06/05/2023

Improving Accelerated Federated Learning with Compression and Importance Sampling

Federated Learning is a collaborative training framework that leverages ...
research
08/16/2021

Reducing the Communication Cost of Federated Learning through Multistage Optimization

A central question in federated learning (FL) is how to design optimizat...
research
02/01/2023

Extending the Known Region of Nonlocal Boxes that Collapse Communication Complexity

Non-signalling boxes (NS) are theoretical resources defined by the princ...
research
04/15/2023

Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis

We study finite-sum distributed optimization problems with n-clients und...
research
02/15/2021

MARINA: Faster Non-Convex Distributed Learning with Compression

We develop and analyze MARINA: a new communication efficient method for ...
research
02/25/2020

Statistically Preconditioned Accelerated Gradient Method for Distributed Optimization

We consider the setting of distributed empirical risk minimization where...

Please sign up or login with your details

Forgot password? Click here to reset