AI Chat AI Image Generator AI Video Text to Speech

Scaled Quantization for the Vision Transformer

03/23/2023

∙

by Yangyang Chang, et al.

∙

∙

Quantization using a small number of bits shows promise for reducing latency and memory usage in deep neural networks. However, most quantization methods cannot readily handle complicated functions such as exponential and square root, and prior approaches involve complex training processes that must interact with floating-point values. This paper proposes a robust method for the full integer quantization of vision transformer networks without requiring any intermediate floating-point computations. The quantization techniques can be applied in various hardware or software implementations, including processor/memory architectures and FPGAs.

Yangyang Chang
1 publication
Gerald E. Sobelman
1 publication

page 1

page 2

page 3

page 4

research

∙ 09/17/2020

Towards Fully 8-bit Integer Inference for the Transformer Model

8-bit integer inference, as a promising direction in reducing both the l...

0 Ye Lin, et al. ∙

research

∙ 03/25/2021

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computatio...

10 Amir Gholami, et al. ∙

research

∙ 01/07/2019

DSConv: Efficient Convolution Operator

We introduce a variation of the convolutional layer called DSConv (Distr...

0 Marcelo Gennari, et al. ∙

research

∙ 06/30/2023

Designing strong baselines for ternary neural network quantization through support and mass equalization

Deep neural networks (DNNs) offer the highest performance in a wide rang...

0 Edouard Yvinec, et al. ∙

research

∙ 06/17/2020

StatAssist GradBoost: A Study on Optimal INT8 Quantization-aware Training from Scratch

This paper studies the scratch training of quantization-aware training (...

0 Taehoon Kim, et al. ∙

research

∙ 06/10/2018

Static Quantized Radix-2 FFT/IFFT Processor for Constraints Analysis

This research work focuses on the design of a high-resolution fast Fouri...

0 Rozita Teymourzadeh, et al. ∙

research

∙ 07/18/2022

Is Integer Arithmetic Enough for Deep Learning Training?

The ever-increasing computational complexity of deep learning models mak...

0 Alireza Ghaffari, et al. ∙