MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

04/22/2018
by   Guoxin Cui, et al.
0

One of the most significant bottleneck in training large scale machine learning models on parameter server (PS) is the communication overhead, because it needs to frequently exchange the model gradients between the workers and servers during the training iterations. Gradient quantization has been proposed as an effective approach to reducing the communication volume. One key issue in gradient quantization is setting the number of bits for quantizing the gradients. Small number of bits can significantly reduce the communication overhead while hurts the gradient accuracies, and vise versa. An ideal quantization method would dynamically balance the communication overhead and model accuracy, through adjusting the number bits according to the knowledge learned from the immediate past training iterations. Existing methods, however, quantize the gradients either with fixed number of bits, or with predefined heuristic rules. In this paper we propose a novel adaptive quantization method within the framework of reinforcement learning. The method, referred to as MQGrad, formalizes the selection of quantization bits as actions in a Markov decision process (MDP) where the MDP states records the information collected from the past optimization iterations (e.g., the sequence of the loss function values). During the training iterations of a machine learning algorithm, MQGrad continuously updates the MDP state according to the changes of the loss function. Based on the information, MDP learns to select the optimal actions (number of bits) to quantize the gradients. Experimental results based on a benchmark dataset showed that MQGrad can accelerate the learning of a large scale deep neural network while keeping its prediction accuracies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Nested Dithered Quantization for Communication Reduction in Distributed Training

In distributed training, the communication cost due to the transmission ...
research
02/08/2021

Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning

Communication of model updates between client nodes and the central aggr...
research
12/08/2021

SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning

Stochastic optimization algorithms implemented on distributed computing ...
research
02/15/2023

Sparse-SignSGD with Majority Vote for Communication-Efficient Distributed Learning

The training efficiency of complex deep learning models can be significa...
research
10/30/2016

Accurate Deep Representation Quantization with Gradient Snapping Layer for Similarity Search

Recent advance of large scale similarity search involves using deeply le...
research
10/09/2019

High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning

Edge machine learning involves the deployment of learning algorithms at ...
research
04/18/2022

How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct

In this paper, we introduce 𝖢𝖮_3, an algorithm for communication-efficie...

Please sign up or login with your details

Forgot password? Click here to reset