Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

08/18/2023
∙
by   Xiaoge Deng, et al.
∙
0
∙

Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay τ. Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of 𝒊Ėƒ(T-τ/nτ) and 𝒊Ėƒ(1/n) for quadratic convex and strongly convex problems, respectively, where T refers to the iteration number and n is the amount of training data. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm. Analogous analysis can be generalized to the random delay setting, and the experimental results validate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 02/02/2021

Stability and Generalization of the Decentralized Stochastic Gradient Descent

The stability and generalization of stochastic gradient-based methods pr...
research
∙ 09/27/2016

Generalization Error Bounds for Optimization Algorithms via Stability

Many machine learning tasks can be formulated as Regularized Empirical R...
research
∙ 09/29/2019

Distributed SGD Generalizes Well Under Asynchrony

The performance of fully synchronized distributed systems has faced a bo...
research
∙ 06/15/2022

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

The existing analysis of asynchronous stochastic gradient descent (SGD) ...
research
∙ 02/17/2022

Delay-adaptive step-sizes for asynchronous learning

In scalable machine learning systems, model training is often paralleliz...
research
∙ 09/27/2021

Unrolling SGD: Understanding Factors Influencing Machine Unlearning

Machine unlearning is the process through which a deployed machine learn...
research
∙ 06/05/2023

Improved Stability and Generalization Analysis of the Decentralized SGD Algorithm

This paper presents a new generalization error analysis for the Decentra...

Please sign up or login with your details

Forgot password? Click here to reset