Delay-adaptive step-sizes for asynchronous learning

02/17/2022
by   Xuyang Wu, et al.
0

In scalable machine learning systems, model training is often parallelized over multiple nodes that run without tight synchronization. Most analysis results for the related asynchronous algorithms use an upper bound on the information delays in the system to determine learning rates. Not only are such bounds hard to obtain in advance, but they also result in unnecessarily slow convergence. In this paper, we show that it is possible to use learning rates that depend on the actual time-varying delays in the system. We develop general convergence results for delay-adaptive asynchronous iterations and specialize these to proximal incremental gradient descent and block-coordinate descent algorithms. For each of these methods, we demonstrate how delays can be measured on-line, present delay-adaptive step-size policies, and illustrate their theoretical and practical advantages over the state-of-the-art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2021

Asynchronous Iterations in Optimization: New Sequence Results and Sharper Algorithmic Guarantees

We introduce novel convergence results for asynchronous iterations which...
research
08/18/2023

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Stochastic gradient descent (SGD) performed in an asynchronous manner pl...
research
02/05/2019

Asynchronous Delay-Aware Accelerated Proximal Coordinate Descent for Nonconvex Nonsmooth Problems

Nonconvex and nonsmooth problems have recently attracted considerable at...
research
03/14/2018

Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning

Performance of distributed optimization and learning systems is bottlene...
research
03/03/2018

Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous ...
research
03/23/2020

Slow and Stale Gradients Can Win the Race

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous ...
research
07/02/2020

Adaptive Braking for Mitigating Gradient Delay

Neural network training is commonly accelerated by using multiple synchr...

Please sign up or login with your details

Forgot password? Click here to reset