Overcoming Oscillations in Quantization-Aware Training

03/21/2022
by   Markus Nagel, et al.
4

When training neural networks with simulated quantization, we observe that quantized weights can, rather unexpectedly, oscillate between two grid-points. The importance of this effect and its impact on quantization-aware training are not well-understood or investigated in literature. In this paper, we delve deeper into the phenomenon of weight oscillations and show that it can lead to a significant accuracy degradation due to wrongly estimated batch-normalization statistics during inference and increased noise during training. These effects are particularly pronounced in low-bit (≤ 4-bits) quantization of efficient networks with depth-wise separable layers, such as MobileNets and EfficientNets. In our analysis we investigate several previously proposed quantization-aware training (QAT) algorithms and show that most of these are unable to overcome oscillations. Finally, we propose two novel QAT algorithms to overcome oscillations during training: oscillation dampening and iterative weight freezing. We demonstrate that our algorithms achieve state-of-the-art accuracy for low-bit (3 4 bits) weight and activation quantization of efficient architectures, such as MobileNetV2, MobileNetV3, and EfficentNet-lite on ImageNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2018

UNIQ: Uniform Noise Injection for the Quantization of Neural Networks

We present a novel method for training deep neural network amenable to i...
research
06/07/2019

Fighting Quantization Bias With Bias

Low-precision representation of deep neural networks (DNNs) is critical ...
research
05/23/2021

Post-Training Sparsity-Aware Quantization

Quantization is a technique used in deep neural networks (DNNs) to incre...
research
02/04/2023

Oscillation-free Quantization for Low-bit Vision Transformers

Weight oscillation is an undesirable side effect of quantization-aware t...
research
08/11/2020

PROFIT: A Novel Training Method for sub-4-bit MobileNet Models

4-bit and lower precision mobile models are required due to the ever-inc...
research
05/31/2020

Quantized Neural Networks: Characterization and Holistic Optimization

Quantized deep neural networks (QDNNs) are necessary for low-power, high...

Please sign up or login with your details

Forgot password? Click here to reset