Patch-wise Mixed-Precision Quantization of Vision Transformer

05/11/2023
by   Junrui Xiao, et al.
0

As emerging hardware begins to support mixed bit-width arithmetic computation, mixed-precision quantization is widely used to reduce the complexity of neural networks. However, Vision Transformers (ViTs) require complex self-attention computation to guarantee the learning of powerful feature representations, which makes mixed-precision quantization of ViTs still challenging. In this paper, we propose a novel patch-wise mixed-precision quantization (PMQ) for efficient inference of ViTs. Specifically, we design a lightweight global metric, which is faster than existing methods, to measure the sensitivity of each component in ViTs to quantization errors. Moreover, we also introduce a pareto frontier approach to automatically allocate the optimal bit-precision according to the sensitivity. To further reduce the computational complexity of self-attention in inference stage, we propose a patch-wise module to reallocate bit-width of patches in each layer. Extensive experiments on the ImageNet dataset shows that our method greatly reduces the search cost and facilitates the application of mixed-precision quantization to ViTs.

READ FULL TEXT
research
12/20/2022

CSMPQ:Class Separability Based Mixed-Precision Quantization

Mixed-precision quantization has received increasing attention for its c...
research
01/19/2022

Q-ViT: Fully Differentiable Quantization for Vision Transformer

In this paper, we propose a fully differentiable quantization method for...
research
06/14/2021

Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization

Mixed-precision quantization is a powerful tool to enable memory and com...
research
09/19/2022

SAMP: A Toolkit for Model Inference with Self-Adaptive Mixed-Precision

The latest industrial inference engines, such as FasterTransformer1 and ...
research
10/13/2021

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Quantization is a widely used technique to compress and accelerate deep ...
research
03/04/2022

Patch Similarity Aware Data-Free Quantization for Vision Transformers

Vision transformers have recently gained great success on various comput...
research
03/16/2022

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

The exponentially large discrete search space in mixed-precision quantiz...

Please sign up or login with your details

Forgot password? Click here to reset