Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization

08/25/2022
by   Zhengyi Li, et al.
0

Post-training quantization (PTQ) attracts increasing attention due to its convenience in deploying quantized neural networks. Rounding, the primary source of quantization error, is optimized only for model weights, while activations still use the rounding-to-nearest operation. In this work, for the first time, we demonstrate that well-chosen rounding schemes for activations can improve the final accuracy. To deal with the challenge of the dynamicity of the activation rounding scheme, we adaptively adjust the rounding border through a simple function to generate rounding schemes at the inference stage. The border function covers the impact of weight errors, activation errors, and propagated errors to eliminate the bias of the element-wise error, which further benefits model accuracy. We also make the border aware of global errors to better fit different arriving activations. Finally, we propose the AQuant framework to learn the border function. Extensive experiments show that AQuant achieves noticeable improvements with negligible overhead compared with state-of-the-art works and pushes the accuracy of ResNet-18 up to 60.3% under the 2-bit weight and activation post-training quantization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2018

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks (QNN)

Deep learning algorithms achieve high classification accuracy at the exp...
research
04/20/2018

Value-aware Quantization for Training and Inference of Neural Networks

We propose a novel value-aware quantization which applies aggressively r...
research
11/29/2022

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

The complicated architecture and high training cost of vision transforme...
research
05/30/2023

Towards Accurate Data-free Quantization for Diffusion Models

In this paper, we propose an accurate data-free post-training quantizati...
research
03/11/2022

QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

Recently, post-training quantization (PTQ) has driven much attention to ...
research
10/16/2022

FIT: A Metric for Model Sensitivity

Model compression is vital to the deployment of deep learning on edge de...
research
04/20/2020

LSQ+: Improving low-bit quantization through learnable offsets and better initialization

Unlike ReLU, newer activation functions (like Swish, H-swish, Mish) that...

Please sign up or login with your details

Forgot password? Click here to reset