QuIP: 2-Bit Quantization of Large Language Models With Guarantees

07/25/2023
by   Jerry Chee, et al.
0

This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. Our code can be found at https://github.com/jerry-chee/QuIP .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2023

A Comprehensive Study on Post-Training Quantization for Large Language Models

Post-training quantization () had been recently shown as a compromising ...
research
02/14/2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

Quantization of deep neural networks (DNN) has been proven effective for...
research
09/11/2023

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Large Language Models (LLMs) have proven their exceptional capabilities ...
research
11/23/2021

HERO: Hessian-Enhanced Robust Optimization for Unifying and Improving Generalization and Quantization Performance

With the recent demand of deploying neural network models on mobile and ...
research
06/04/2023

OWQ: Lessons learned from activation outliers for weight quantization in large language models

Large language models (LLMs) with hundreds of billions of parameters sho...
research
09/30/2021

Towards Efficient Post-training Quantization of Pre-trained Language Models

Network quantization has gained increasing attention with the rapid grow...
research
01/02/2023

Massive Language Models Can Be Accurately Pruned in One-Shot

We show for the first time that large-scale generative pretrained transf...

Please sign up or login with your details

Forgot password? Click here to reset