Asynchronous Training Schemes in Distributed Learning with Time Delay

08/28/2022
by   Haoxiang Wang, et al.
0

In the context of distributed deep learning, the issue of stale weights or gradients could result in poor algorithmic performance. This issue is usually tackled by delay tolerant algorithms with some mild assumptions on the objective functions and step sizes. In this paper, we propose a different approach to develop a new algorithm, called Predicting Clipping Asynchronous Stochastic Gradient Descent (aka, PC-ASGD). Specifically, PC-ASGD has two steps - the predicting step leverages the gradient prediction using Taylor expansion to reduce the staleness of the outdated weights while the clipping step selectively drops the outdated weights to alleviate their negative effects. A tradeoff parameter is introduced to balance the effects between these two steps. Theoretically, we present the convergence rate considering the effects of delay of the proposed algorithm with constant step size when the smooth objective functions are weakly strongly-convex and nonconvex. One practical variant of PC-ASGD is also proposed by adopting a condition to help with the determination of the tradeoff parameter. For empirical validation, we demonstrate the performance of the algorithm with two deep neural network architectures on two benchmark datasets.

READ FULL TEXT
research
06/06/2019

A Non-Asymptotic Analysis of Network Independence for Distributed Stochastic Gradient Descent

This paper is concerned with minimizing the average of n cost functions ...
research
01/14/2023

CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence

In this paper, we consider solving the distributed optimization problem ...
research
11/05/2018

Non-ergodic Convergence Analysis of Heavy-Ball Algorithms

In this paper, we revisit the convergence of the Heavy-ball method, and ...
research
12/31/2021

Distributed Random Reshuffling over Networks

In this paper, we consider the distributed optimization problem where n ...
research
06/22/2021

Asynchronous Stochastic Optimization Robust to Arbitrary Delays

We consider stochastic optimization with delayed gradients where, at eac...
research
07/16/2017

Normalized Gradient with Adaptive Stepsize Method for Deep Neural Network Training

In this paper, we propose a generic and simple algorithmic framework for...
research
06/23/2017

Collaborative Deep Learning in Fixed Topology Networks

There is significant recent interest to parallelize deep learning algori...

Please sign up or login with your details

Forgot password? Click here to reset