-
NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization
As the size and complexity of models and datasets grow, so does the need...
read it
-
Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning
Communication of model updates between client nodes and the central aggr...
read it
-
Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training
The communication of gradients is costly for training deep neural networ...
read it
-
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Communication bottleneck has been identified as a significant issue in d...
read it
-
On Efficient Constructions of Checkpoints
Efficient construction of checkpoints/snapshots is a critical tool for t...
read it
-
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
We study gradient compression methods to alleviate the communication bot...
read it
-
MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server
One of the most significant bottleneck in training large scale machine l...
read it
Adaptive Gradient Quantization for Data-Parallel SGD
Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2 communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters.
READ FULL TEXT
Comments
There are no comments yet.