Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

09/30/2022
by   Yuki Takezawa, et al.
0

SGD with momentum acceleration is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum acceleration is Distributed SGD (DSGD) with momentum acceleration (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum acceleration that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and decrease when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum acceleration whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the standard deep learning setting, where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient β∈ [0, 1). Through image classification tasks, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum acceleration and can consistently outperform these existing methods when the data distributions are heterogeneous.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2023

A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex Models and Heterogeneous Data

Emerging distributed applications recently boosted the development of de...
research
05/23/2022

Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization

In recent years, decentralized learning has emerged as a powerful tool n...
research
02/09/2021

Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data

Decentralized training of deep learning models is a key element for enab...
research
03/18/2016

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Nesterov's momentum trick is famously known for accelerating gradient de...
research
02/01/2022

DoCoM-SGT: Doubly Compressed Momentum-assisted Stochastic Gradient Tracking Algorithm for Communication Efficient Decentralized Learning

This paper proposes the Doubly Compressed Momentum-assisted Stochastic G...
research
04/09/2022

Yes, Topology Matters in Decentralized Optimization: Refined Convergence and Topology Learning under Heterogeneous Data

One of the key challenges in federated and decentralized learning is to ...
research
01/23/2021

Acceleration Methods

This monograph covers some recent advances on a range of acceleration te...

Please sign up or login with your details

Forgot password? Click here to reset