Smoothness-Aware Quantization Techniques

06/07/2021
by   Bokun Wang, et al.
0

Distributed machine learning has become an indispensable tool for training large supervised machine learning models. To address the high communication costs of distributed training, which is further exacerbated by the fact that modern highly performing models are typically overparameterized, a large body of work has been devoted in recent years to the design of various compression strategies, such as sparsification and quantization, and optimization algorithms capable of using them. Recently, Safaryan et al (2021) pioneered a dramatically different compression design approach: they first use the local training data to form local smoothness matrices, and then propose to design a compressor capable of exploiting the smoothness information contained therein. While this novel approach leads to substantial savings in communication, it is limited to sparsification as it crucially depends on the linearity of the compression operator. In this work, we resolve this problem by extending their smoothness-aware compression strategy to arbitrary unbiased compression operators, which also includes sparsification. Specializing our results to quantization, we observe significant savings in communication complexity compared to standard quantization. In particular, we show theoretically that block quantization with n blocks outperforms single block quantization, leading to a reduction in communication complexity by an 𝒪(n) factor, where n is the number of nodes in the distributed system. Finally, we provide extensive numerical evidence that our smoothness-aware quantization strategies outperform existing quantization schemes as well the aforementioned smoothness-aware sparsification strategies with respect to all relevant success measures: the number of iterations, the total amount of bits communicated, and wall-clock time.

READ FULL TEXT
research
02/14/2021

Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

Large scale distributed optimization has become the default tool for the...
research
11/01/2019

On Distributed Quantization for Classification

We consider the problem of distributed feature quantization, where the g...
research
05/29/2023

Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Efficient distributed training is a principal driver of recent advances ...
research
01/26/2019

Distributed Learning with Compressed Gradient Differences

Training very large machine learning models requires a distributed compu...
research
06/17/2021

On Effects of Compression with Hyperdimensional Computing in Distributed Randomized Neural Networks

A change of the prevalent supervised learning techniques is foreseeable ...
research
06/12/2023

NF4 Isn't Information Theoretically Optimal (and that's Good)

This note shares some simple calculations and experiments related to abs...
research
02/21/2018

3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning

The performance and efficiency of distributed machine learning (ML) depe...

Please sign up or login with your details

Forgot password? Click here to reset