BiViT: Extremely Compressed Binary Vision Transformer

11/14/2022
by   Yefei He, et al.
0

Model binarization can significantly compress model size, reduce energy consumption, and accelerate inference through efficient bit-wise operations. Although binarizing convolutional neural networks have been extensively studied, there is little work on exploring binarization on vision Transformers which underpin most recent breakthroughs in visual recognition. To this end, we propose to solve two fundamental challenges to push the horizon of Binary Vision Transformers (BiViT). First, the traditional binary method does not take the long-tailed distribution of softmax attention into consideration, bringing large binarization errors in the attention module. To solve this, we propose Softmax-aware Binarization, which dynamically adapts to the data distribution and reduces the error caused by binarization. Second, to better exploit the information of the pretrained model and restore accuracy, we propose a Cross-layer Binarization scheme and introduce learnable channel-wise scaling factors for weight binarization. The former decouples the binarization of self-attention and MLP to avoid mutual interference while the latter enhances the representation capacity of binarized models. Overall, our method performs favorably against state-of-the-arts by 19.8 ImageNet, BiViT achieves a competitive 70.8 outperforming the existing SOTA methods by a clear margin.

READ FULL TEXT
research
09/15/2023

Replacing softmax with ReLU in Vision Transformers

Previous research observed accuracy degradation when replacing the atten...
research
09/12/2021

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Transformers have sprung up in the field of computer vision. In this wor...
research
07/26/2023

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

Existing analyses of the expressive capacity of Transformer models have ...
research
07/07/2023

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Transformer networks have emerged as the state-of-the-art approach for n...
research
10/13/2022

Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer

The large pre-trained vision transformers (ViTs) have demonstrated remar...
research
11/27/2021

FQ-ViT: Fully Quantized Vision Transformer without Retraining

Network quantization significantly reduces model inference complexity an...
research
11/04/2022

Boosting Binary Neural Networks via Dynamic Thresholds Learning

Developing lightweight Deep Convolutional Neural Networks (DCNNs) and Vi...

Please sign up or login with your details

Forgot password? Click here to reset