FQ-ViT: Fully Quantized Vision Transformer without Retraining

11/27/2021
by   Yang Lin, et al.
0

Network quantization significantly reduces model inference complexity and has been widely used in real-world deployments. However, most existing quantization methods have been developed and tested mainly on Convolutional Neural Networks (CNN), and suffer severe degradation when applied to Transformer-based architectures. In this work, we present a systematic method to reduce the performance degradation and inference complexity of Quantized Transformers. In particular, we propose Powers-of-Two Scale (PTS) to deal with the serious inter-channel variation of LayerNorm inputs in a hardware-friendly way. In addition, we propose Log-Int-Softmax (LIS) that can sustain the extreme non-uniform distribution of the attention maps while simplifying inference by using 4-bit quantization and the BitShift operator. Comprehensive experiments on various Transformer-based architectures and benchmarks show that our methods outperform previous works in performance while using even lower bit-width in attention maps. For instance, we reach 85.17 ImageNet and 51.4 mAP with Cascade Mask R-CNN (Swin-S) on COCO. To our knowledge, we are the first to achieve comparable accuracy degradation ( 1 fully quantized Vision Transformers. Code is available at https://github.com/linyang-zhh/FQ-ViT.

READ FULL TEXT

page 1

page 7

research
08/21/2023

Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers

Quantization scale and bit-width are the most important parameters when ...
research
10/13/2022

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

The large pre-trained vision transformers (ViTs) have demonstrated remar...
research
12/16/2022

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Post-training quantization (PTQ), which only requires a tiny dataset for...
research
05/10/2022

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

Transformers have achieved great success in pluralistic image inpainting...
research
09/26/2022

Device-friendly Guava fruit and leaf disease detection using deep learning

This work presents a deep learning-based plant disease diagnostic system...
research
09/13/2022

PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

Data-free quantization can potentially address data privacy and security...
research
11/14/2022

BiViT: Extremely Compressed Binary Vision Transformer

Model binarization can significantly compress model size, reduce energy ...

Please sign up or login with your details

Forgot password? Click here to reset