A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

06/12/2020
by   Zhize Li, et al.
18

In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime. To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient. Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA. We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant. Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence results for many new methods which arise as special cases. In the more general distributed/federated nonconvex optimization setup, we propose two new general algorithmic frameworks differing in whether direct gradient compression (DC) or compression of gradient differences (DIANA) is used. We show that all methods captured by these two frameworks also satisfy our unified assumption. Thus, our unified convergence analysis also captures a large variety of distributed methods utilizing compressed communication. Finally, we also provide a unified analysis for obtaining faster linear convergence rates in this nonconvex regime under the PL condition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2020

Better Theory for SGD in the Nonconvex World

Large-scale nonconvex optimization problems are ubiquitous in modern mac...
research
05/27/2019

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

In this paper we introduce a unified analysis of a large family of varia...
research
03/02/2021

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

We propose ZeroSARAH – a novel variant of the variance-reduced method SA...
research
02/15/2022

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent a...
research
10/20/2021

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

In distributed learning, local SGD (also known as federated averaging) a...
research
05/27/2019

One Method to Rule Them All: Variance Reduction for Data, Parameters and Many New Methods

We propose a remarkably general variance-reduced method suitable for sol...
research
10/07/2021

EF21 with Bells Whistles: Practical Algorithmic Extensions of Modern Error Feedback

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a ...

Please sign up or login with your details

Forgot password? Click here to reset