Natural Compression for Distributed Deep Learning

05/27/2019
by   Samuel Horvath, et al.
0

Due to their hunger for big data, modern deep learning models are trained in parallel, often in distributed environments, where communication of model updates is the bottleneck. Various update compression (e.g., quantization, sparsification, dithering) techniques have been proposed in recent years as a successful tool to alleviate this problem. In this work, we introduce a new, remarkably simple and theoretically and practically effective compression technique, which we call natural compression (NC). Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two. NC is "natural" since the nearest power of two of a real expressed as a float can be obtained without any computation, simply by ignoring the mantissa. We show that compared to no compression, NC increases the second moment of the compressed vector by the tiny factor 9/8 only, which means that the effect of NC on the convergence speed of popular training algorithms, such as distributed SGD, is negligible. However, the communications savings enabled by NC are substantial, leading to 3-4x improvement in overall theoretical running time. For applications requiring more aggressive compression, we generalize NC to natural dithering, which we prove is exponentially better than the immensely popular random dithering technique. Our compression operators can be used on their own or in combination with existing operators for a more aggressive combined effect. Finally, we show that N is particularly effective for the in-network aggregation (INA) framework for distributed training, where the update aggregation is done on a switch, which can only perform integer computations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2018

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Data parallelism can boost the training speed of convolutional neural ne...
research
06/21/2022

Shifted Compression Framework: Generalizations and Improvements

Communication is one of the key bottlenecks in the distributed training ...
research
05/29/2023

Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Efficient distributed training is a principal driver of recent advances ...
research
11/19/2019

On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

Compressed communication, in the form of sparsification or quantization ...
research
11/03/2020

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Decentralized optimization methods enable on-device training of machine ...
research
06/16/2022

Compressed-VFL: Communication-Efficient Learning with Vertically Partitioned Data

We propose Compressed Vertical Federated Learning (C-VFL) for communicat...
research
02/16/2023

THC: Accelerating Distributed Deep Learning Using Tensor Homomorphic Compression

Deep neural networks (DNNs) are the de-facto standard for essential use ...

Please sign up or login with your details

Forgot password? Click here to reset