Improved Stability and Generalization Analysis of the Decentralized SGD Algorithm

06/05/2023
by   Batiste Le Bars, et al.
0

This paper presents a new generalization error analysis for the Decentralized Stochastic Gradient Descent (D-SGD) algorithm based on algorithmic stability. The obtained results largely improve upon state-of-the-art results, and even invalidate their claims that the communication graph has a detrimental effect on generalization. For instance, we show that in convex settings, D-SGD has the same generalization bounds as the classical SGD algorithm, no matter the choice of graph. We exhibit that this counter-intuitive result comes from considering the average of local parameters, which hides a final global averaging step incompatible with the decentralized scenario. In light of this observation, we advocate to analyze the supremum over local parameters and show that in this case, the graph does have an impact on the generalization. Unlike prior results, our analysis yields non-vacuous bounds even for non-connected graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2021

Stability and Generalization of the Decentralized Stochastic Gradient Descent

The stability and generalization of stochastic gradient-based methods pr...
research
06/05/2023

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Decentralized stochastic gradient descent (D-SGD) allows collaborative l...
research
08/18/2023

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

Stochastic gradient descent (SGD) performed in an asynchronous manner pl...
research
06/25/2022

Topology-aware Generalization of Decentralized SGD

This paper studies the algorithmic stability and generalizability of dec...
research
11/15/2020

Acceleration of stochastic methods on the example of decentralized SGD

In this paper, we present an algorithm for accelerating decentralized st...
research
10/31/2019

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

In this paper, we propose and analyze SPARQ-SGD, which is an event-trigg...
research
10/26/2021

Exponential Graph is Provably Efficient for Decentralized Deep Training

Decentralized SGD is an emerging training method for deep learning known...

Please sign up or login with your details

Forgot password? Click here to reset