I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

07/04/2022
by   Zhikai Li, et al.
0

Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision applications. These models, however, have considerable storage and computational overheads, making their deployment and efficient inference on edge devices challenging. Quantization is a promising approach to reducing model complexity; unfortunately, existing efforts to quantize ViTs are simulated quantization (aka fake quantization), which remains floating-point arithmetic during inference and thus contributes little to model acceleration. In this paper, we propose I-ViT, an integer-only quantization scheme for ViTs, to enable ViTs to perform the entire computational graph of inference with integer operations and bit-shifting and no floating-point operations. In I-ViT, linear operations (e.g., MatMul and Dense) follow the integer-only pipeline with dyadic arithmetic, and non-linear operations (e.g., Softmax, GELU, and LayerNorm) are approximated by the proposed light-weight integer-only arithmetic methods. In particular, I-ViT applies the proposed Shiftmax and ShiftGELU, which are designed to use integer bit-shifting to approximate the corresponding floating-point operations. We evaluate I-ViT on various benchmark models and the results show that integer-only INT8 quantization achieves comparable (or even higher) accuracy to the full-precision (FP) baseline. Furthermore, we utilize TVM for practical hardware deployment on the GPU's integer arithmetic units, achieving 3.72 4.11× inference speedup compared to the FP model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2021

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-...
research
09/17/2020

Towards Fully 8-bit Integer Inference for the Transformer Model

8-bit integer inference, as a promising direction in reducing both the l...
research
05/21/2023

Integer or Floating Point? New Outlooks for Low-Bit Quantization on Large Language Models

Efficient deployment of large language models (LLMs) necessitates low-bi...
research
07/18/2022

Is Integer Arithmetic Enough for Deep Learning Training?

The ever-increasing computational complexity of deep learning models mak...
research
03/31/2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...
research
07/15/2016

On the efficient representation and execution of deep acoustic models

In this paper we present a simple and computationally efficient quantiza...
research
08/21/2021

Integer-arithmetic-only Certified Robustness for Quantized Neural Networks

Adversarial data examples have drawn significant attention from the mach...

Please sign up or login with your details

Forgot password? Click here to reset