Periodic Stochastic Gradient Descent with Momentum for Decentralized Training

08/24/2020
by   Hongchang Gao, et al.
19

Decentralized training has been actively studied in recent years. Although a wide variety of methods have been proposed, yet the decentralized momentum SGD method is still underexplored. In this paper, we propose a novel periodic decentralized momentum SGD method, which employs the momentum schema and periodic communication for decentralized training. With these two strategies, as well as the topology of the decentralized training system, the theoretical convergence analysis of our proposed method is difficult. We address this challenging problem and provide the condition under which our proposed method can achieve the linear speedup regarding the number of workers. Furthermore, we also introduce a communication-efficient variant to reduce the communication cost in each communication round. The condition for achieving the linear speedup is also provided for this variant. To the best of our knowledge, these two methods are all the first ones achieving these theoretical results in their corresponding domain. We conduct extensive experiments to verify the performance of our proposed two methods, and both of them have shown superior performance over existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2020

Adaptive Serverless Learning

With the emergence of distributed data, training machine learning models...
research
04/24/2021

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training

The scale of deep learning nowadays calls for efficient distributed trai...
research
07/25/2023

Achieving Linear Speedup in Decentralized Stochastic Compositional Minimax Optimization

The stochastic compositional minimax problem has attracted a surge of at...
research
05/09/2019

On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization

Recent developments on large-scale distributed machine learning applicat...
research
05/04/2021

GT-STORM: Taming Sample, Communication, and Memory Complexities in Decentralized Non-Convex Learning

Decentralized nonconvex optimization has received increasing attention i...
research
08/26/2020

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Adam is the important optimization algorithm to guarantee efficiency and...
research
10/01/2019

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Distributed optimization is essential for training large models on large...

Please sign up or login with your details

Forgot password? Click here to reset