RPTQ: Reorder-based Post-training Quantization for Large Language Models

04/03/2023
by   Zhihang Yuan, et al.
0

Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of outliers.We propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.

READ FULL TEXT
research
08/30/2023

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

In the era of large-scale language models, the substantial parameter siz...
research
04/18/2023

Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Quantization of transformer language models faces significant challenges...
research
05/10/2022

Reducing Activation Recomputation in Large Transformer Models

Training large transformer models is one of the most important computati...
research
08/20/2023

Activation Addition: Steering Language Models Without Optimization

Reliably controlling the behavior of large language models (LLMs) is a p...
research
06/04/2023

OWQ: Lessons learned from activation outliers for weight quantization in large language models

Large language models (LLMs) with hundreds of billions of parameters sho...
research
09/04/2023

Softmax Bias Correction for Quantized Generative Models

Post-training quantization (PTQ) is the go-to compression technique for ...
research
10/29/2022

Empirical Evaluation of Post-Training Quantization Methods for Language Tasks

Transformer-based architectures like BERT have achieved great success in...

Please sign up or login with your details

Forgot password? Click here to reset