Learning low-precision neural networks without Straight-Through Estimator(STE)

03/04/2019
by   Zhi-Gang Liu, et al.
0

The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low-precision using stochastic gradient descent (SGD). Our method (AB) avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight w_q and the corresponding full-precision weight w with non-trainable scalar coefficient α and 1-α. During training, α is gradually increased from 0 to 1; the gradient updates to the weights are through the full-precision term, (1-α)w, of the affine combination; the model is converted from full-precision to low-precision progressively. To evaluate the method, a 1-bit BinaryNet on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet_50 v1/2 on ImageNet dataset are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9 respectively compared to the results of STE based quantization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2018

Blended Coarse Gradient Descent for Full Quantization of Deep Neural Networks

Quantized deep neural networks (QDNNs) are attractive due to their much ...
research
03/09/2018

High-Accuracy Low-Precision Training

Low-precision computation is often used to lower the time and energy cos...
research
06/20/2022

Low-Precision Stochastic Gradient Langevin Dynamics

While low-precision optimization has been widely used to accelerate deep...
research
05/22/2020

Position-based Scaled Gradient for Model Quantization and Sparse Training

We propose the position-based scaled gradient (PSG) that scales the grad...
research
10/01/2018

ProxQuant: Quantized Neural Networks via Proximal Operators

To make deep neural networks feasible in resource-constrained environmen...
research
04/26/2019

SWALP : Stochastic Weight Averaging in Low-Precision Training

Low precision operations can provide scalability, memory savings, portab...
research
02/26/2020

Moniqua: Modulo Quantized Communication in Decentralized SGD

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has...

Please sign up or login with your details

Forgot password? Click here to reset