Low Precision Decentralized Distributed Training over IID and non-IID Data

11/17/2021
by   Sai Aparna Aketi, et al.
0

Decentralized distributed learning is the key to enabling large-scale machine learning (training) on the edge devices utilizing private user-generated local data, without relying on the cloud. However, the practical realization of such on-device training is limited by the communication and compute bottleneck. In this paper, we propose and show the convergence of low precision decentralized training that aims to reduce the computational complexity and communication cost of decentralized training. Many feedback-based compression techniques have been proposed in the literature to reduce communication costs. To the best of our knowledge, there is no work that applies and shows compute efficient training techniques such quantization, pruning, etc., for peer-to-peer decentralized learning setups. Since real-world applications have a significant skew in the data distribution, we design "Range-EvoNorm" as the normalization activation layer which is better suited for low precision training over non-IID data. Moreover, we show that the proposed low precision training can be used in synergy with other communication compression methods decreasing the communication cost further. Our experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with non-IID data. However, when low precision training is accompanied by communication compression through sparsification we observe a 1-2 accuracy. The proposed low precision decentralized training decreases computational complexity, memory usage, and communication cost by 4x and compute energy by a factor of  20x, while trading off less than a 1% accuracy for both IID and non-IID data. In particular, with higher skew values, we observe an increase in accuracy (by   0.5 indicating the regularization effect of the quantization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2019

DeepSqueeze: Decentralization Meets Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...
research
02/10/2021

Sparse-Push: Communication- Energy-Efficient Decentralized Distributed Learning over Directed Time-Varying Graphs with non-IID Datasets

Current deep learning (DL) systems rely on a centralized computing parad...
research
10/01/2019

The Non-IID Data Quagmire of Decentralized Machine Learning

Many large-scale machine learning (ML) applications need to train ML mod...
research
07/17/2019

DeepSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression

Communication is a key bottleneck in distributed training. Recently, an ...
research
05/06/2022

Online Model Compression for Federated Learning with Large Models

This paper addresses the challenges of training large neural network mod...
research
11/03/2020

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Decentralized optimization methods enable on-device training of machine ...
research
06/20/2023

DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization

With the increase in the scale of Deep Learning (DL) training workloads ...

Please sign up or login with your details

Forgot password? Click here to reset