Gradient-Based Post-Training Quantization: Challenging the Status Quo

by   Edouard Yvinec, et al.

Quantization has become a crucial step for the efficient deployment of deep neural networks, where floating point operations are converted to simpler fixed point operations. In its most naive form, it simply consists in a combination of scaling and rounding transformations, leading to either a limited compression rate or a significant accuracy drop. Recently, Gradient-based post-training quantization (GPTQ) methods appears to be constitute a suitable trade-off between such simple methods and more powerful, yet expensive Quantization-Aware Training (QAT) approaches, particularly when attempting to quantize LLMs, where scalability of the quantization process is of paramount importance. GPTQ essentially consists in learning the rounding operation using a small calibration set. In this work, we challenge common choices in GPTQ methods. In particular, we show that the process is, to a certain extent, robust to a number of variables (weight selection, feature augmentation, choice of calibration set). More importantly, we derive a number of best practices for designing more efficient and scalable GPTQ methods, regarding the problem formulation (loss, degrees of freedom, use of non-uniform quantization schemes) or optimization process (choice of variable and optimizer). Lastly, we propose a novel importance-based mixed-precision technique. Those guidelines lead to significant performance improvements on all the tested state-of-the-art GPTQ methods and networks (e.g. +6.819 points on ViT for 4-bit quantization), paving the way for the design of scalable, yet effective quantization methods.


Learning Multimodal Fixed-Point Weights using Gradient Descent

Due to their high computational complexity, deep neural networks are sti...

NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search

Deep neural network (DNN) deployment has been confined to larger hardwar...

QFT: Post-training quantization via fast joint finetuning of all degrees of freedom

The post-training quantization (PTQ) challenge of bringing quantized neu...

Trained Uniform Quantization for Accurate and Efficient Neural Network Inference on Fixed-Point Hardware

We propose a method of training quantization clipping thresholds for uni...

Up or Down? Adaptive Rounding for Post-Training Quantization

When quantizing neural networks, assigning each floating-point weight to...

HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance

With the recent demand of deploying neural network models on mobile and ...

On Quantizing Implicit Neural Representations

The role of quantization within implicit/coordinate neural networks is s...

Please sign up or login with your details

Forgot password? Click here to reset