Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

07/16/2023
by   Peiyu Liu, et al.
0

Despite the superior performance, Large Language Models (LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on emergent abilities, which are important characteristics that distinguish LLMs from small language models. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.

READ FULL TEXT
research
08/30/2023

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

In the era of large-scale language models, the substantial parameter siz...
research
09/06/2023

Norm Tweaking: High-performance Low-bit Quantization of Large Language Models

As the size of large language models (LLMs) continues to grow, model com...
research
09/04/2023

Are Emergent Abilities in Large Language Models just In-Context Learning?

Large language models have exhibited emergent abilities, demonstrating e...
research
09/11/2023

Understanding the Impact of Post-Training Quantization on Large Language Models

Large language models (LLMs) are rapidly increasing in size, with the nu...
research
05/23/2023

QLoRA: Efficient Finetuning of Quantized LLMs

We present QLoRA, an efficient finetuning approach that reduces memory u...
research
03/09/2023

Greener yet Powerful: Taming Large Code Generation Models with Quantization

ML-powered code generation aims to assist developers to write code in a ...
research
04/19/2023

A Latent Space Theory for Emergent Abilities in Large Language Models

Languages are not created randomly but rather to communicate information...

Please sign up or login with your details

Forgot password? Click here to reset