Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

by   Tomoya Murata, et al.

Federated learning is one of the important learning scenarios in distributed learning, in which we aim at learning heterogeneous local datasets efficiently in terms of communication and computational cost. In this paper, we study new local algorithms called Bias-Variance Reduced Local SGD (BVR-L-SGD) for nonconvex federated learning. One of the novelties of this paper is in the analysis of our bias and variance reduced local gradient estimators which fully utilize small second-order heterogeneity of local objectives and suggests to randomly pick up one of the local models instead of taking average of them when workers are synchronized. Under small heterogeneity of local objectives, we show that our methods achieve smaller communication complexity than both the previous non-local and local methods for general nonconvex objectives. Furthermore, we also compare the total execution time, that is the sum of total communication time and total computational time per worker, and show the superiority of our methods to the existing methods when the heterogeneity is small and single communication time is more time consuming than single stochastic gradient computation. Numerical results are provided to verify our theoretical findings and give empirical evidence of the superiority of our algorithms.


page 8

page 18

page 19


Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

In recent centralized nonconvex distributed learning and federated learn...

On the Convergence of Local Descent Methods in Federated Learning

In federated distributed learning, the goal is to optimize a global trai...

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

Federated Averaging (FedAvg, also known as Local-SGD) (McMahan et al., 2...

Lower Bounds and Optimal Algorithms for Personalized Federated Learning

In this work, we consider the optimization formulation of personalized f...

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

We propose ZeroSARAH – a novel variant of the variance-reduced method SA...

Local SGD: Unified Theory and New Efficient Methods

We present a unified framework for analyzing local SGD methods in the co...

ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!

We introduce ProxSkip – a surprisingly simple and provably efficient met...