Towards Accurate Post-Training Quantization for Vision Transformer

03/25/2023
by   Yifu Ding, et al.
0

Vision transformer emerges as a potential architecture for vision tasks. However, the intense computation and non-negligible delay hinder its application in the real world. As a widespread model compression technique, existing post-training quantization methods still cause severe performance drops. We find the main reasons lie in (1) the existing calibration metric is inaccurate in measuring the quantization influence for extremely low-bit representation, and (2) the existing quantization paradigm is unfriendly to the power-law distribution of Softmax. Based on these observations, we propose a novel Accurate Post-training Quantization framework for Vision Transformer, namely APQ-ViT. We first present a unified Bottom-elimination Blockwise Calibration scheme to optimize the calibration metric to perceive the overall quantization disturbance in a blockwise manner and prioritize the crucial quantization errors that influence more on the final output. Then, we design a Matthew-effect Preserving Quantization for Softmax to maintain the power-law character and keep the function of the attention mechanism. Comprehensive experiments on large-scale classification and detection datasets demonstrate that our APQ-ViT surpasses the existing post-training quantization methods by convincing margins, especially in lower bit-width settings (e.g., averagely up to 5.17 also highlight that APQ-ViT enjoys versatility and works well on diverse transformer variants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2022

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Post-training quantization (PTQ), which only requires a tiny dataset for...
research
11/24/2021

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers

Quantization is one of the most effective methods to compress neural net...
research
01/19/2022

Q-ViT: Fully Differentiable Quantization for Vision Transformer

In this paper, we propose a fully differentiable quantization method for...
research
04/26/2022

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

We introduce a Power-of-Two post-training quantization( PTQ) method for ...
research
03/22/2023

Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction

Recently, vision transformers (ViT) have replaced convolutional neural n...
research
11/17/2022

Zero-Shot Dynamic Quantization for Transformer Inference

We introduce a novel run-time method for significantly reducing the accu...
research
03/23/2023

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing dee...

Please sign up or login with your details

Forgot password? Click here to reset