Fighting Quantization Bias With Bias

06/07/2019
by   Alexander Finkelstein, et al.
0

Low-precision representation of deep neural networks (DNNs) is critical for efficient deployment of deep learning application on embedded platforms, however, converting the network to low precision degrades its performance. Crucially, networks that are designed for embedded applications usually suffer from increased degradation since they have less redundancy. This is most evident for the ubiquitous MobileNet architecture which requires a costly quantization-aware training cycle to achieve acceptable performance when quantized to 8-bits. In this paper, we trace the source of the degradation in MobileNets to a shift in the mean activation value. This shift is caused by an inherent bias in the quantization process which builds up across layers, shifting all network statistics away from the learned distribution. We show that this phenomenon happens in other architectures as well. We propose a simple remedy - compensating for the quantization induced shift by adding a constant to the additive bias term of each channel. We develop two simple methods for estimating the correction constants - one using iterative evaluation of the quantized network and one where the constants are set using a short training phase. Both methods are fast and require only a small amount of unlabeled data, making them appealing for rapid deployment of neural networks. Using the above methods we are able to match the performance of training-based quantization of MobileNets at a fraction of the cost.

READ FULL TEXT
research
03/21/2022

Overcoming Oscillations in Quantization-Aware Training

When training neural networks with simulated quantization, we observe th...
research
07/28/2021

MARViN – Multiple Arithmetic Resolutions Vacillating in Neural Networks

Quantization is a technique for reducing deep neural networks (DNNs) tra...
research
10/12/2018

Quantization for Rapid Deployment of Deep Neural Networks

This paper aims at rapid deployment of the state-of-the-art deep neural ...
research
03/23/2023

Benchmarking the Reliability of Post-training Quantization: a Particular Focus on Worst-case Performance

Post-training quantization (PTQ) is a popular method for compressing dee...
research
04/08/2022

Characterizing and Understanding the Behavior of Quantized Models for Reliable Deployment

Deep Neural Networks (DNNs) have gained considerable attention in the pa...
research
10/30/2020

Reset band for mitigatation of quantization induced performance degradation

Reset control has emerged as a viable alternative to popular PID, capabl...
research
10/30/2020

Time regularization as a solution to mitigate quantization induced performance degradation

Reset control is known to be able to outperform PID and the like linear ...

Please sign up or login with your details

Forgot password? Click here to reset