ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

03/02/2021
by   Zhize Li, et al.
6

We propose ZeroSARAH – a novel variant of the variance-reduced method SARAH (Nguyen et al., 2017) – for minimizing the average of a large number of nonconvex functions 1/n∑_i=1^nf_i(x). To the best of our knowledge, in this nonconvex finite-sum regime, all existing variance-reduced methods, including SARAH, SVRG, SAGA and their variants, need to compute the full gradient over all n data samples at the initial point x^0, and then periodically compute the full gradient once every few iterations (for SVRG, SARAH and their variants). Moreover, SVRG, SAGA and their variants typically achieve weaker convergence results than variants of SARAH: n^2/3/ϵ^2 vs. n^1/2/ϵ^2. ZeroSARAH is the first variance-reduced method which does not require any full gradient computations, not even for the initial point. Moreover, ZeroSARAH obtains new state-of-the-art convergence results, which can improve the previous best-known result (given by e.g., SPIDER, SpiderBoost, SARAH, SSRGD and PAGE) in certain regimes. Avoiding any full gradient computations (which is a time-consuming step) is important in many applications as the number of data samples n usually is very large. Especially in the distributed setting, periodic computation of full gradient over all data samples needs to periodically synchronize all machines/devices, which may be impossible or very hard to achieve. Thus, we expect that ZeroSARAH will have a practical impact in distributed and federated learning where full device participation is impractical.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2022

DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization

We develop and analyze DASHA: a new family of methods for nonconvex dist...
research
06/12/2020

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

In this paper, we study the performance of a large family of SGD variant...
research
03/21/2021

ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method

We propose a novel accelerated variance-reduced gradient method called A...
research
02/05/2021

Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

Federated learning is one of the important learning scenarios in distrib...
research
01/28/2022

Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

In this paper, we study the finite-sum convex optimization problem focus...
research
06/24/2019

A Stochastic Composite Gradient Method with Incremental Variance Reduction

We consider the problem of minimizing the composition of a smooth (nonco...

Please sign up or login with your details

Forgot password? Click here to reset