RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

04/26/2022
by   Hongyi Yao, et al.
0

We introduce a Power-of-Two post-training quantization( PTQ) method for deep neural network that meets hardware requirements and does not call for long-time retraining. PTQ requires a small set of calibration data and is easier for deployment, but results in lower accuracy than Quantization-Aware Training( QAT). Power-of-Two quantization can convert the multiplication introduced by quantization and dequantization to bit-shift that is adopted by many efficient accelerators. However, the Power-of-Two scale has fewer candidate values, which leads to more rounding or clipping errors. We propose a novel Power-of-Two PTQ framework, dubbed RAPQ, which dynamically adjusts the Power-of-Two scales of the whole network instead of statically determining them layer by layer. It can theoretically trade off the rounding error and clipping error of the whole network. Meanwhile, the reconstruction method in RAPQ is based on the BN information of every unit. Extensive experiments on ImageNet prove the excellent performance of our proposed method. Without bells and whistles, RAPQ can reach accuracy of 65 with weight INT2 activation INT4. We are the first to propose PTQ for the more constrained but hardware-friendly Power-of-Two quantization and prove that it can achieve nearly the same accuracy as SOTA PTQ method. The code will be released.

READ FULL TEXT
research
08/13/2020

Weight Equalizing Shift Scaler-Coupled Post-training Quantization

Post-training, layer-wise quantization is preferable because it is free ...
research
09/19/2021

HPTQ: Hardware-Friendly Post Training Quantization

Neural network quantization enables the deployment of models on edge dev...
research
12/16/2022

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Post-training quantization (PTQ), which only requires a tiny dataset for...
research
02/10/2021

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

We study the challenging task of neural network quantization without end...
research
01/17/2022

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization (PTQ) method achievin...
research
01/26/2022

Post-training Quantization for Neural Networks with Provable Guarantees

While neural networks have been remarkably successful in a wide array of...
research
03/25/2023

Towards Accurate Post-Training Quantization for Vision Transformer

Vision transformer emerges as a potential architecture for vision tasks....

Please sign up or login with your details

Forgot password? Click here to reset