Quantizing data for distributed learning

12/14/2020
by   Osama A. Hanna, et al.
9

We consider machine learning applications that train a model by leveraging data distributed over a network, where communication constraints can create a performance bottleneck. A number of recent approaches are proposing to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach, that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach combines aspects of: (1) sample selection; (2) dataset quantization; and (3) gradient compensation. We analyze the convergence of the proposed approach for smooth convex and non-convex objective functions and show that we can achieve order optimal convergence rates with communication that mostly depends on the data rather than the model (gradient) dimension. We use our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet datasets, and show that we can achieve an order of magnitude savings over gradient compression methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2020

Analysis of SGD with Biased Gradient Estimators

We analyze the complexity of biased stochastic gradient methods (SGD), w...
research
06/06/2019

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

Communication bottleneck has been identified as a significant issue in d...
research
01/26/2019

Distributed Learning with Compressed Gradient Differences

Training very large machine learning models requires a distributed compu...
research
07/02/2020

Federated Learning with Compression: Unified Analysis and Sharp Guarantees

In federated learning, communication cost is often a critical bottleneck...
research
06/17/2020

Is Network the Bottleneck of Distributed Training?

Recently there has been a surge of research on improving the communicati...
research
06/30/2019

Network-accelerated Distributed Machine Learning Using MLFabric

Existing distributed machine learning (DML) systems focus on improving t...
research
02/22/2018

SparCML: High-Performance Sparse Communication for Machine Learning

One of the main drivers behind the rapid recent advances in machine lear...

Please sign up or login with your details

Forgot password? Click here to reset