NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search

08/10/2023
by   Edouard Yvinec, et al.
0

Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models quantization, namely, non-uniform quantization. NUPES leverages automorphisms to preserve the scalar multiplications. Such transformations are derived from power functions. However, the optimization of the exponent parameter and weight values remains a challenging and novel problem which could not be solved with previous post training optimization techniques which only learn to round up or down weight values in order to preserve the predictive function. We circumvent this limitation with a new paradigm: learning new quantized weights over the entire quantized space. Similarly, we enable the optimization of the power exponent, i.e. the optimization of the quantization operator itself during training by alleviating all the numerical instabilities. The resulting predictive function is compatible with integer-only low-bit inference. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.

READ FULL TEXT

page 1

page 9

research
01/24/2023

PowerQuant: Automorphism Search for Non-Uniform Quantization

Deep neural networks (DNNs) are nowadays ubiquitous in many domains such...
research
03/09/2022

Power-of-Two Quantization for Low Bitwidth and Hardware Compliant Neural Networks

Deploying Deep Neural Networks in low-power embedded devices for real ti...
research
08/15/2023

Gradient-Based Post-Training Quantization: Challenging the Status Quo

Quantization has become a crucial step for the efficient deployment of d...
research
08/19/2023

Analyzing Quantization in TVM

There has been many papers in academic literature on quantizing weight t...
research
01/28/2019

Improving Neural Network Quantization using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of ...
research
01/28/2019

Improving Neural Network Quantization without Retraining using Outlier Channel Splitting

Quantization can improve the execution latency and energy efficiency of ...
research
07/08/2020

AUSN: Approximately Uniform Quantization by Adaptively Superimposing Non-uniform Distribution for Deep Neural Networks

Quantization is essential to simplify DNN inference in edge applications...

Please sign up or login with your details

Forgot password? Click here to reset