Softmax Bias Correction for Quantized Generative Models

09/04/2023
by   Nilesh Prasad Pandey, et al.
0

Post-training quantization (PTQ) is the go-to compression technique for large generative models, such as stable diffusion or large language models. PTQ methods commonly keep the softmax activation in higher precision as it has been shown to be very sensitive to quantization noise. However, this can lead to a significant runtime and power overhead during inference on resource-constraint edge devices. In this work, we investigate the source of the softmax sensitivity to quantization and show that the quantization operation leads to a large bias in the softmax output, causing accuracy degradation. To overcome this issue, we propose an offline bias correction technique that improves the quantizability of softmax without additional compute during deployment, as it can be readily absorbed into the quantization parameters. We demonstrate the effectiveness of our method on stable diffusion v1.5 and 125M-size OPT language model, achieving significant accuracy improvement for 8-bit quantized softmax.

READ FULL TEXT
research
06/04/2023

Temporal Dynamic Quantization for Diffusion Models

The diffusion model has gained popularity in vision applications due to ...
research
09/06/2023

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

As the size of large language models (LLMs) continues to grow, model com...
research
03/09/2023

Greener yet Powerful: Taming Large Code Generation Models with Quantization

ML-powered code generation aims to assist developers to write code in a ...
research
08/30/2023

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

In the era of large-scale language models, the substantial parameter siz...
research
06/13/2023

SqueezeLLM: Dense-and-Sparse Quantization

Generative Large Language Models (LLMs) have demonstrated remarkable res...
research
09/11/2023

Understanding the Impact of Post-Training Quantization on Large Language Models

Large language models (LLMs) are rapidly increasing in size, with the nu...
research
04/03/2023

RPTQ: Reorder-based Post-training Quantization for Large Language Models

Large-scale language models (LLMs) have demonstrated outstanding perform...

Please sign up or login with your details

Forgot password? Click here to reset