Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

09/01/2022
by   Kazusato Oko, et al.
0

While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation. In this paper, we present a single-loop algorithm named SLEDGE (Single-Loop mEthoD for Gradient Estimator) for finite-sum nonconvex optimization, which does not require periodic refresh of the gradient estimator but achieves nearly optimal gradient complexity. Unlike existing methods, SLEDGE has the advantage of versatility; (i) second-order optimality, (ii) exponential convergence in the PL region, and (iii) smaller complexity under less heterogeneity of data. We build an efficient federated learning algorithm by exploiting these favorable properties. We show the first and second-order optimality of the output and also provide analysis under PL conditions. When the local budget is sufficiently large and clients are less (Hessian-) heterogeneous, the algorithm requires fewer communication rounds then existing methods such as FedAvg, SCAFFOLD, and Mime. The superiority of our method is verified in numerical experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2022

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

In recent centralized nonconvex distributed learning and federated learn...
research
02/05/2021

Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

Federated learning is one of the important learning scenarios in distrib...
research
09/06/2021

On Second-order Optimization Methods for Federated Learning

We consider federated learning (FL), where the training data is distribu...
research
09/06/2022

Faster federated optimization under second-order similarity

Federated learning (FL) is a subfield of machine learning where multiple...
research
12/12/2019

Parallel Restarted SPIDER – Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity

In this paper, we propose a distributed algorithm for stochastic smooth,...
research
02/14/2023

EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data

Gradient clipping is an important technique for deep neural networks wit...
research
06/20/2018

A Distributed Second-Order Algorithm You Can Trust

Due to the rapid growth of data and computational resources, distributed...

Please sign up or login with your details

Forgot password? Click here to reset