Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks

12/20/2021
by   Xuanjie Li, et al.
0

In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives, and incorporates a non-smooth regularization term for the better generalization ability. Decentralized stochastic proximal gradient (DSPG) method is commonly used to train this type of learning models, while the convergence rate is retarded by the variance of stochastic gradients. In this paper, we propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique. The basic idea is to introduce an estimator in each node, which tracks the local full gradient periodically, to correct the stochastic gradient at each iteration. By transforming our decentralized algorithm into a centralized inexact proximal gradient algorithm with variance reduction, and controlling the bounds of error sequences, we prove that DPSVRG converges at the rate of O(1/T) for general convex objectives plus a non-smooth term with T as the number of iterations, while DSPG converges at the rate O(1/√(T)). Our experiments on different applications, network topologies and learning models demonstrate that DPSVRG converges much faster than DSPG, and the loss function of DPSVRG decreases smoothly along with the training epochs.

READ FULL TEXT

page 1

page 9

research
03/19/2014

A Proximal Stochastic Gradient Method with Progressive Variance Reduction

We consider the problem of minimizing the sum of two convex functions: o...
research
03/17/2016

Variance Reduction for Faster Non-Convex Optimization

We consider the fundamental problem in non-convex optimization of effici...
research
02/07/2023

Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

We present an algorithm for minimizing an objective with hard-to-compute...
research
05/20/2019

A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization

Decentralized optimization is a promising paradigm that finds various ap...
research
11/06/2021

Distributed stochastic proximal algorithm with random reshuffling for non-smooth finite-sum optimization

The non-smooth finite-sum minimization is a fundamental problem in machi...
research
06/25/2020

Dual-Free Stochastic Decentralized Optimization with Variance Reduction

We consider the problem of training machine learning models on distribut...
research
08/18/2020

On the Convergence of Consensus Algorithms with Markovian Noise and Gradient Bias

This paper presents a finite time convergence analysis for a decentraliz...

Please sign up or login with your details

Forgot password? Click here to reset