Error-aware Quantization through Noise Tempering

12/11/2022
by   Zheng Wang, et al.
0

Quantization has become a predominant approach for model compression, enabling deployment of large models trained on GPUs onto smaller form-factor devices for inference. Quantization-aware training (QAT) optimizes model parameters with respect to the end task while simulating quantization error, leading to better performance than post-training quantization. Approximation of gradients through the non-differentiable quantization operator is typically achieved using the straight-through estimator (STE) or additive noise. However, STE-based methods suffer from instability due to biased gradients, whereas existing noise-based methods cannot reduce the resulting variance. In this work, we incorporate exponentially decaying quantization-error-aware noise together with a learnable scale of task loss gradient to approximate the effect of a quantization operator. We show this method combines gradient scale and quantization noise in a better optimized way, providing finer-grained estimation of gradients at each weight and activation layer's quantizer bin size. Our controlled noise also contains an implicit curvature term that could encourage flatter minima, which we show is indeed the case in our experiments. Experiments training ResNet architectures on the CIFAR-10, CIFAR-100 and ImageNet benchmarks show that our method obtains state-of-the-art top-1 classification accuracy for uniform (non mixed-precision) quantization, out-performing previous methods by 0.5-1.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2023

MetaGrad: Adaptive Gradient Quantization with Hypernetworks

A popular track of network compression approach is Quantization aware Tr...
research
10/13/2022

SQuAT: Sharpness- and Quantization-Aware Training for BERT

Quantization is an effective technique to reduce memory footprint, infer...
research
04/20/2021

Differentiable Model Compression via Pseudo Quantization Noise

We propose to add independent pseudo quantization noise to model paramet...
research
04/15/2020

Training with Quantization Noise for Extreme Fixed-Point Compression

We tackle the problem of producing compact models, maximizing their accu...
research
06/12/2023

Efficient Quantization-aware Training with Adaptive Coreset Selection

The expanding model size and computation of deep neural networks (DNNs) ...
research
12/24/2022

Hyperspherical Loss-Aware Ternary Quantization

Most of the existing works use projection functions for ternary quantiza...
research
06/17/2020

Universally Quantized Neural Compression

A popular approach to learning encoders for lossy compression is to use ...

Please sign up or login with your details

Forgot password? Click here to reset