Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions

09/28/2022
by   Sai Aparna Aketi, et al.
0

Decentralized learning over distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed. This paper focuses on improving decentralized learning over non-IID data. We propose Neighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. Cross-gradients for a pair of neighboring agents are the derivatives of the model parameters of an agent with respect to the dataset of the other agent. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the neighbors' parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). The data-variant cross-gradients are aggregated through an additional communication round without breaking the privacy constraints. Further, we present CompNGC, a compressed version of NGC that reduces the communication overhead by 32 ×. We demonstrate the efficiency of the proposed technique over non-IID data sampled from various vision and language datasets trained on diverse models, graph sizes, and topologies. Our experiments demonstrate that NGC and CompNGC outperform (by 0-6%) the existing SoTA decentralized learning algorithm over non-IID data with significantly less compute and memory requirements. Further, our experiments show that the model-variant cross-gradient information available locally at each agent can improve the performance over non-IID data by 1-35% without additional communication cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2021

Cross-Gradient Aggregation for Decentralized Learning from Non-IID data

Decentralized learning enables a group of collaborative agents to learn ...
research
06/16/2023

Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness

Language model training in distributed settings is limited by the commun...
research
02/10/2020

On the Communication Latency of Wireless Decentralized Learning

We consider a wireless network comprising n nodes located within a circu...
research
11/06/2020

Communication-efficient Decentralized Local SGD over Undirected Networks

We consider the distributed learning problem where a network of n agents...
research
10/21/2020

Decentralized Deep Learning using Momentum-Accelerated Consensus

We consider the problem of decentralized deep learning where multiple ag...
research
05/08/2023

Global Update Tracking: A Decentralized Learning Algorithm for Heterogeneous Data

Decentralized learning enables the training of deep learning models over...
research
03/27/2023

CoDeC: Communication-Efficient Decentralized Continual Learning

Training at the edge utilizes continuously evolving data generated at di...

Please sign up or login with your details

Forgot password? Click here to reset