A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

by   Zhize Li, et al.

In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime. To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient. Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA. We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant. Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases. In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used. We show that all methods captured by these two frameworks also satisfy our unified assumption. Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication. Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.


page 1

page 2

page 3

page 4


Better Theory for SGD in the Nonconvex World

Large-scale nonconvex optimization problems are ubiquitous in modern mac...

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

In this paper we introduce a unified analysis of a large family of varia...

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

We propose ZeroSARAH – a novel variant of the variance-reduced method SA...

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent a...

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

In distributed learning, local SGD (also known as federated averaging) a...

One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods

We propose a remarkably general variance-reduced method suitable for sol...

EF21 with Bells Whistles: Practical Algorithmic Extensions of Modern Error Feedback

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a ...

Please sign up or login with your details

Forgot password? Click here to reset