Decentralized Learning with Separable Data: Generalization and Fast Algorithms

09/15/2022
by   Hossein Taheri, et al.
0

Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2022

Stability vs Implicit Bias of Gradient Methods on Separable Data and Beyond

An influential line of recent work has focused on the generalization pro...
research
05/22/2023

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

Normalized gradient descent has shown substantial success in speeding up...
research
02/09/2021

Consensus Control for Decentralized Deep Learning

Decentralized training of deep learning models enables on-device learnin...
research
10/27/2017

The Implicit Bias of Gradient Descent on Separable Data

We show that gradient descent on an unregularized logistic regression pr...
research
04/14/2023

Exact Subspace Diffusion for Decentralized Multitask Learning

Classical paradigms for distributed learning, such as federated or decen...
research
06/30/2020

Gradient Methods Never Overfit On Separable Data

A line of recent works established that when training linear predictors ...
research
05/25/2017

Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent

Most distributed machine learning systems nowadays, including TensorFlow...

Please sign up or login with your details

Forgot password? Click here to reset