Oscillation-free Quantization for Low-bit Vision Transformers

02/04/2023
by   Shih-Yang Liu, et al.
0

Weight oscillation is an undesirable side effect of quantization-aware training, in which quantized weights frequently jump between two quantized levels, resulting in training instability and a sub-optimal final model. We discover that the learnable scaling factor, a widely-used de facto setting in quantization aggravates weight oscillation. In this study, we investigate the connection between the learnable scaling factor and quantized weight oscillation and use ViT as a case driver to illustrate the findings and remedies. In addition, we also found that the interdependence between quantized weights in query and key of a self-attention layer makes ViT vulnerable to oscillation. We, therefore, propose three techniques accordingly: statistical weight quantization (StatsQ) to improve quantization robustness compared to the prevalent learnable-scale-based method; confidence-guided annealing (CGA) that freezes the weights with high confidence and calms the oscillating weights; and query-key reparameterization (QKR) to resolve the query-key intertwined oscillation and mitigate the resulting gradient misestimation. Extensive experiments demonstrate that these proposed techniques successfully abate weight oscillation and consistently achieve substantial accuracy improvement on ImageNet. Specifically, our 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8 respectively. The code is included in the supplementary material and will be released.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset