Parallel Restarted SPIDER – Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity

12/12/2019
by   Pranay Sharma, et al.
0

In this paper, we propose a distributed algorithm for stochastic smooth, non-convex optimization. We assume a worker-server architecture where N nodes, each having n (potentially infinite) number of samples, collaborate with the help of a central server to perform the optimization task. The global objective is to minimize the average of local cost functions available at individual nodes. The proposed approach is a non-trivial extension of the popular parallel-restarted SGD algorithm, incorporating the optimal variance-reduction based SPIDER gradient estimator into it. We prove convergence of our algorithm to a first-order stationary solution. The proposed approach achieves the best known communication complexity O(ϵ^-1) along with the optimal computation complexity. For finite-sum problems (finite n), we achieve the optimal computation (IFO) complexity O(√(Nn)ϵ^-1). For online problems (n unknown or infinite), we achieve the optimal IFO complexity O(ϵ^-3/2). In both the cases, we maintain the linear speedup achieved by existing methods. This is a massive improvement over the O(ϵ^-2) IFO complexity of the existing approaches. Additionally, our algorithm is general enough to allow non-identical distributions of data across workers, as in the recently proposed federated learning paradigm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2020

Distributed Stochastic Non-Convex Optimization: Momentum-Based Variance Reduction

In this work, we propose a distributed algorithm for stochastic non-conv...
research
01/22/2020

Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning

Federated Learning is a powerful machine learning paradigm to cooperativ...
research
08/25/2021

Decentralized optimization with non-identical sampling in presence of stragglers

We consider decentralized consensus optimization when workers sample dat...
research
11/10/2020

Distributed Stochastic Consensus Optimization with Momentum for Nonconvex Nonsmooth Problems

While many distributed optimization algorithms have been proposed for so...
research
02/12/2022

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

In recent centralized nonconvex distributed learning and federated learn...
research
09/01/2022

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

While variance reduction methods have shown great success in solving lar...
research
06/04/2019

Distributed Training with Heterogeneous Data: Bridging Median and Mean Based Algorithms

Recently, there is a growing interest in the study of median-based algor...

Please sign up or login with your details

Forgot password? Click here to reset