Quantizing data for distributed learning

12/14/2020
by   Osama A. Hanna, et al.
9

We consider machine learning applications that train a model by leveraging data distributed over a network, where communication constraints can create a performance bottleneck. A number of recent approaches are proposing to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach, that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach combines aspects of: (1) sample selection; (2) dataset quantization; and (3) gradient compensation. We analyze the convergence of the proposed approach for smooth convex and non-convex objective functions and show that we can achieve order optimal convergence rates with communication that mostly depends on the data rather than the model (gradient) dimension. We use our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet datasets, and show that we can achieve an order of magnitude savings over gradient compression methods.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/31/2020

Analysis of SGD with Biased Gradient Estimators

We analyze the complexity of biased stochastic gradient methods (SGD), w...
06/06/2019

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

Communication bottleneck has been identified as a significant issue in d...
01/26/2019

Distributed Learning with Compressed Gradient Differences

Training very large machine learning models requires a distributed compu...
06/30/2019

Network-accelerated Distributed Machine Learning Using MLFabric

Existing distributed machine learning (DML) systems focus on improving t...
06/17/2020

Is Network the Bottleneck of Distributed Training?

Recently there has been a surge of research on improving the communicati...
10/29/2020

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

Distributed model training suffers from communication bottlenecks due to...
02/22/2018

SparCML: High-Performance Sparse Communication for Machine Learning

One of the main drivers behind the rapid recent advances in machine lear...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.