Position-based Scaled Gradient for Model Quantization and Sparse Training

05/22/2020
by   Jangho Kim, et al.
0

We propose the position-based scaled gradient (PSG) that scales the gradient depending on the position of a weight vector to make it more compression-friendly. First, we theoretically show that applying PSG to the standard gradient descent (GD), which is called PSGD, is equivalent to the GD in the warped weight space, a space made by warping the original weight space via an appropriately designed invertible function. Second, we empirically show that PSG acting as a regularizer to a weight vector is very useful in model compression domains such as quantization and sparse training. PSG reduces the gap between the weight distributions of a full-precision model and its compressed counterpart. This enables the versatile deployment of a model either as an uncompressed mode or as a compressed mode depending on the availability of resources. The experimental results on CIFAR-10/100 and Imagenet datasets show the effectiveness of the proposed PSG in both domains of sparse training and quantization even for extremely low bits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2022

Quantized Sparse Weight Decomposition for Neural Network Compression

In this paper, we introduce a novel method of neural network weight comp...
research
03/04/2019

Learning low-precision neural networks without Straight-Through Estimator(STE)

The Straight-Through Estimator (STE) is widely used for back-propagating...
research
07/31/2022

Symmetry Regularization and Saturating Nonlinearity for Robust Quantization

Robust quantization improves the tolerance of networks for various imple...
research
08/23/2021

Rate distortion comparison of a few gradient quantizers

This article is in the context of gradient compression. Gradient compres...
research
09/28/2020

Rotated Binary Neural Network

Binary Neural Network (BNN) shows its predominance in reducing the compl...
research
04/09/2022

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

We propose an adaptive projection-gradient descent-shrinkage-splitting m...

Please sign up or login with your details

Forgot password? Click here to reset