Distance-aware Quantization

08/16/2021
by   Dohyung Kim, et al.
0

We address the problem of network quantization, that is, reducing bit-widths of weights and/or activations to lighten network architectures. Quantization methods use a rounding function to map full-precision values to the nearest quantized ones, but this operation is not differentiable. There are mainly two approaches to training quantized networks with gradient-based optimizers. First, a straight-through estimator (STE) replaces the zero derivative of the rounding with that of an identity function, which causes a gradient mismatch problem. Second, soft quantizers approximate the rounding with continuous functions at training time, and exploit the rounding for quantization at test time. This alleviates the gradient mismatch, but causes a quantizer gap problem. We alleviate both problems in a unified framework. To this end, we introduce a novel quantizer, dubbed a distance-aware quantizer (DAQ), that mainly consists of a distance-aware soft rounding (DASR) and a temperature controller. To alleviate the gradient mismatch problem, DASR approximates the discrete rounding with the kernel soft argmax, which is based on our insight that the quantization can be formulated as a distance-based assignment problem between full-precision values and quantized ones. The controller adjusts the temperature parameter in DASR adaptively according to the input, addressing the quantizer gap problem. Experimental results on standard benchmarks show that DAQ outperforms the state of the art significantly for various bit-widths without bells and whistles.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2021

Network Quantization with Element-wise Gradient Scaling

Network quantization aims at reducing bit-widths of weights and/or activ...
research
02/16/2020

BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations

Binary Neural Networks (BNNs) have been garnering interest thanks to the...
research
10/27/2020

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Fully quantized training (FQT), which uses low-bitwidth hardware by quan...
research
07/07/2022

Attention Round for Post-Training Quantization

At present, the quantification methods of neural network models are main...
research
12/24/2022

Hyperspherical Loss-Aware Ternary Quantization

Most of the existing works use projection functions for ternary quantiza...
research
06/17/2020

Universally Quantized Neural Compression

A popular approach to learning encoders for lossy compression is to use ...
research
04/26/2022

A New Spin on Color Quantization

We address the problem of image color quantization using a Maximum Entro...

Please sign up or login with your details

Forgot password? Click here to reset