Scaled Quantization for the Vision Transformer

03/23/2023
by   Yangyang Chang, et al.
0

Quantization using a small number of bits shows promise for reducing latency and memory usage in deep neural networks. However, most quantization methods cannot readily handle complicated functions such as exponential and square root, and prior approaches involve complex training processes that must interact with floating-point values. This paper proposes a robust method for the full integer quantization of vision transformer networks without requiring any intermediate floating-point computations. The quantization techniques can be applied in various hardware or software implementations, including processor/memory architectures and FPGAs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/17/2020

Towards Fully 8-bit Integer Inference for the Transformer Model

8-bit integer inference, as a promising direction in reducing both the l...
research
03/25/2021

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computatio...
research
01/07/2019

DSConv: Efficient Convolution Operator

We introduce a variation of the convolutional layer called DSConv (Distr...
research
06/30/2023

Designing strong baselines for ternary neural network quantization through support and mass equalization

Deep neural networks (DNNs) offer the highest performance in a wide rang...
research
06/17/2020

StatAssist GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

This paper studies the scratch training of quantization-aware training (...
research
06/10/2018

Static Quantized Radix-2 FFT/IFFT Processor for Constraints Analysis

This research work focuses on the design of a high-resolution fast Fouri...
research
07/18/2022

Is Integer Arithmetic Enough for Deep Learning Training?

The ever-increasing computational complexity of deep learning models mak...

Please sign up or login with your details

Forgot password? Click here to reset