Distribution Adaptive INT8 Quantization for Training CNNs

02/09/2021
by   Kang Zhao, et al.
0

Researches have demonstrated that low bit-width (e.g., INT8) quantization can be employed to accelerate the inference process. It makes the gradient quantization very promising since the backward propagation requires approximately twice more computation than forward one. Due to the variability and uncertainty of gradient distribution, a lot of methods have been proposed to attain training stability. However, most of them ignore the channel-wise gradient distributions and the impact of gradients with different magnitudes, resulting in the degradation of final accuracy. In this paper, we propose a novel INT8 quantization training framework for convolutional neural network to address the above issues. Specifically, we adopt Gradient Vectorized Quantization to quantize the gradient, based on the observation that layer-wise gradients contain multiple distributions along the channel dimension. Then, Magnitude-aware Clipping Strategy is introduced by taking the magnitudes of gradients into consideration when minimizing the quantization error, and we present a theoretical derivation to solve the quantization parameters of different distributions. Experimental results on broad range of computer vision tasks, such as image classification, object detection and video classification, demonstrate that the proposed Distribution Adaptive INT8 Quantization training method has achieved almost lossless training accuracy for different backbones, including ResNet, MobileNetV2, InceptionV3, VGG and AlexNet, which is superior to the state-of-the-art techniques. Moreover, we further implement the INT8 kernel that can accelerate the training iteration more than 200 latest Turing architecture, i.e., our method excels on both training accuracy and speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2023

MetaGrad: Adaptive Gradient Quantization with Hypernetworks

A popular track of network compression approach is Quantization aware Tr...
research
12/29/2019

Towards Unified INT8 Training for Convolutional Neural Network

Recently low-bit (e.g., 8-bit) network quantization has been extensively...
research
06/30/2020

EasyQuant: Post-training Quantization via Scale Optimization

The 8 bits quantization has been widely applied to accelerate network in...
research
12/16/2022

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Post-training quantization (PTQ), which only requires a tiny dataset for...
research
03/04/2022

Improving the Energy Efficiency and Robustness of tinyML Computer Vision using Log-Gradient Input Images

This paper studies the merits of applying log-gradient input images to c...
research
05/18/2020

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

Quantization has been proven to be an effective method for reducing the ...
research
02/29/2020

Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization

As deep neural networks make their ways into different domains, their co...

Please sign up or login with your details

Forgot password? Click here to reset