Topology-aware Generalization of Decentralized SGD

06/25/2022
by   Tongtian Zhu, et al.
0

This paper studies the algorithmic stability and generalizability of decentralized stochastic gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is 𝒪(m/N+1/m+λ^2)-stable in expectation in the non-convex non-smooth setting, where N is the total sample size of the whole system, m is the worker number, and 1-λ is the spectral gap that measures the connectivity of the communication topology. These results then deliver an 𝒪(1/N+((m^-1λ^2)^α/2+ m^-α)/N^1-α/2) in-average generalization bound, which is non-vacuous even when λ is closed to 1, in contrast to vacuous as suggested by existing literature on the projected version of D-SGD. Our theory indicates that the generalizability of D-SGD has a positive correlation with the spectral gap, and can explain why consensus control in initial training phase can ensure better generalization. Experiments of VGG-11 and ResNet-18 on CIFAR-10, CIFAR-100 and Tiny-ImageNet justify our theory. To our best knowledge, this is the first work on the topology-aware generalization of vanilla D-SGD. Code is available at https://github.com/Raiden-Zhu/Generalization-of-DSGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2023

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Decentralized stochastic gradient descent (D-SGD) allows collaborative l...
research
03/13/2022

Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD

Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to ...
research
12/20/2017

Improving Generalization Performance by Switching from Adam to SGD

Despite superior training outcomes, adaptive optimization methods such a...
research
06/05/2023

Improved Stability and Generalization Analysis of the Decentralized SGD Algorithm

This paper presents a new generalization error analysis for the Decentra...
research
06/30/2020

AdaSGD: Bridging the gap between SGD and Adam

In the context of stochastic gradient descent(SGD) and adaptive moment e...
research
01/05/2023

Beyond spectral gap (extended): The role of the topology in decentralized learning

In data-parallel optimization of machine learning models, workers collab...
research
10/27/2021

Multilayer Lookahead: a Nested Version of Lookahead

In recent years, SGD and its variants have become the standard tool to t...

Please sign up or login with your details

Forgot password? Click here to reset