Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

08/03/2017
by   Yinpeng Dong, et al.
0

Low-bit deep neural networks (DNNs) become critical for embedded applications due to their low storage requirement and computing efficiency. However, they suffer much from the non-negligible accuracy drop. This paper proposes the stochastic quantization (SQ) algorithm for learning accurate low-bit DNNs. The motivation is due to the following observation. Existing training algorithms approximate the real-valued elements/filters with low-bit representation all together in each iteration. The quantization errors may be small for some elements/filters, while are remarkable for others, which lead to inappropriate gradient direction during training, and thus bring notable accuracy drop. Instead, SQ quantizes a portion of elements/filters to low-bit with a stochastic probability inversely proportional to the quantization error, while keeping the other portion unchanged with full-precision. The quantized and full-precision portions are updated with corresponding gradients separately in each iteration. The SQ ratio is gradually increased until the whole network is quantized. This procedure can greatly compensate the quantization error and thus yield better accuracy for low-bit DNNs. Experiments show that SQ can consistently and significantly improve the accuracy for different low-bit DNNs on various datasets and various network structures.

READ FULL TEXT
research
03/24/2021

DNN Quantization with Attention

Low-bit quantization of network weights and activations can drastically ...
research
12/06/2022

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

Mixed-precision quantization has been widely applied on deep neural netw...
research
04/11/2020

From Quantized DNNs to Quantizable DNNs

This paper proposes Quantizable DNNs, a special type of DNNs that can fl...
research
07/11/2023

Mixed-Precision Quantization with Cross-Layer Dependencies

Quantization is commonly used to compress and accelerate deep neural net...
research
12/15/2020

Scalable Verification of Quantized Neural Networks (Technical Report)

Formal verification of neural networks is an active topic of research, a...
research
12/29/2019

Towards Unified INT8 Training for Convolutional Neural Network

Recently low-bit (e.g., 8-bit) network quantization has been extensively...
research
12/24/2022

Hyperspherical Loss-Aware Ternary Quantization

Most of the existing works use projection functions for ternary quantiza...

Please sign up or login with your details

Forgot password? Click here to reset