GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

05/13/2023
by   Tian Gao, et al.
0

Affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method,model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 12

research
01/15/2021

KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization

Recently, transformer-based language models such as BERT have shown trem...
research
06/04/2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Extreme compression, particularly ultra-low bit precision (binary/ternar...
research
05/02/2017

Ternary Neural Networks with Fine-Grained Quantization

We propose a novel fine-grained quantization (FGQ) method to ternarize p...
research
10/19/2020

Bi-Real Net V2: Rethinking Non-linearity for 1-bit CNNs and Going Beyond

Binary neural networks (BNNs), where both weights and activations are bi...
research
05/25/2022

BiT: Robustly Binarized Multi-distilled Transformer

Modern pre-trained transformers have rapidly advanced the state-of-the-a...
research
07/06/2022

Network Binarization via Contrastive Learning

Neural network binarization accelerates deep models by quantizing their ...

Please sign up or login with your details

Forgot password? Click here to reset