DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning

02/05/2021
by   Kelly Kostopoulou, et al.
19

Sparse tensors appear frequently in distributed deep learning, either as a direct artifact of the deep neural network's gradients, or as a result of an explicit sparsification process. Existing communication primitives are agnostic to the peculiarities of deep learning; consequently, they impose unnecessary communication overhead. This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors, tailored for distributed deep learning. DeepReduce decomposes sparse tensors in two sets, values and indices, and allows both independent and combined compression of these sets. We support a variety of common compressors, such as Deflate for values, or run-length encoding for indices. We also propose two novel compression schemes that achieve superior results: curve fitting-based for values and bloom filter-based for indices. DeepReduce is orthogonal to existing gradient sparsifiers and can be applied in conjunction with them, transparently to the end-user, to significantly lower the communication overhead. As proof of concept, we implement our approach on Tensorflow and PyTorch. Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods, without affecting the training accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/05/2021

S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Distributed stochastic gradient descent (SGD) approach has been widely u...
research
02/16/2018

Variance-based Gradient Compression for Efficient Distributed Deep Learning

Due to the substantial computational cost, training state-of-the-art dee...
research
01/19/2022

Near-Optimal Sparse Allreduce for Distributed Deep Learning

Communication overhead is one of the major obstacles to train large deep...
research
05/28/2022

ByteComp: Revisiting Gradient Compression in Distributed Training

Gradient compression (GC) is a promising approach to addressing the comm...
research
05/22/2018

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

Currently, progressively larger deep neural networks are trained on ever...
research
02/16/2023

THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

Deep neural networks (DNNs) are the de-facto standard for essential use ...
research
06/18/2019

ADA-Tucker: Compressing Deep Neural Networks via Adaptive Dimension Adjustment Tucker Decomposition

Despite the recent success of deep learning models in numerous applicati...

Please sign up or login with your details

Forgot password? Click here to reset