Loss Aware Post-training Quantization

11/17/2019
by   Yury Nahshan, et al.
0

Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. We show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. On the other hand, we show that with more aggressive quantization, the loss landscape becomes highly non-separable with sharp minima points, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation accompanies the paper at https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq

READ FULL TEXT

page 1

page 3

page 5

research
11/24/2021

Sharpness-aware Quantization for Deep Neural Networks

Network quantization is an effective compression method to reduce the mo...
research
08/21/2023

Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers

Quantization scale and bit-width are the most important parameters when ...
research
12/26/2020

Hybrid and Non-Uniform quantization methods using retro synthesis data for efficient inference

Existing quantization aware training methods attempt to compensate for t...
research
05/05/2021

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

Various post-training uniform quantization methods have usually been stu...
research
03/03/2023

Rotation Invariant Quantization for Model Compression

Post-training Neural Network (NN) model compression is an attractive app...
research
03/27/2021

Automated Backend-Aware Post-Training Quantization

Quantization is a key technique to reduce the resource requirement and i...

Please sign up or login with your details

Forgot password? Click here to reset