Distributed SGD Generalizes Well Under Asynchrony

09/29/2019
by   Jayanth Regatti, et al.
0

The performance of fully synchronized distributed systems has faced a bottleneck due to the big data trend, under which asynchronous distributed systems are becoming a major popularity due to their powerful scalability. In this paper, we study the generalization performance of stochastic gradient descent (SGD) on a distributed asynchronous system. The system consists of multiple worker machines that compute stochastic gradients which are further sent to and aggregated on a common parameter server to update the variables, and the communication in the system suffers from possible delays. Under the algorithm stability framework, we prove that distributed asynchronous SGD generalizes well given enough data samples in the training optimization. In particular, our results suggest to reduce the learning rate as we allow more asynchrony in the distributed system. Such adaptive learning rate strategy improves the stability of the distributed algorithm and reduces the corresponding generalization error. Then, we confirm our theoretical findings via numerical experiments.

READ FULL TEXT
research
08/18/2023

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Stochastic gradient descent (SGD) performed in an asynchronous manner pl...
research
05/15/2019

DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

A standard approach in large scale machine learning is distributed stoch...
research
01/15/2016

Faster Asynchronous SGD

Asynchronous distributed stochastic gradient descent methods have troubl...
research
03/03/2018

Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous ...
research
09/26/2019

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

Background: Recent developments have made it possible to accelerate neur...
research
03/23/2020

Slow and Stale Gradients Can Win the Race

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous ...
research
01/10/2022

Learning Without a Global Clock: Asynchronous Learning in a Physics-Driven Learning Network

In a neuron network, synapses update individually using local informatio...

Please sign up or login with your details

Forgot password? Click here to reset