Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

09/11/2023
by   Wenhua Cheng, et al.
0

Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound outperforms the established baseline of rounding-to-nearest (RTN) and competes impressively against recent methods, without introducing additional inference overhead. The source code will be publicly available at https://github.com/intel/neural-compressor soon.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2023

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Large language models (LLMs) have revolutionized natural language proces...
research
07/25/2023

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

This work studies post-training parameter quantization in large language...
research
08/30/2023

FPTQ: Fine-grained Post-Training Quantization for Large Language Models

In the era of large-scale language models, the substantial parameter siz...
research
08/21/2023

Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers

Quantization scale and bit-width are the most important parameters when ...
research
12/19/2018

Fast Adjustable Threshold For Uniform Neural Network Quantization

Neural network quantization procedure is the necessary step for porting ...
research
08/16/2023

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

Large Language Models (LLMs) have achieved state-of-the-art performance ...
research
10/06/2021

8-bit Optimizers via Block-wise Quantization

Stateful optimizers maintain gradient statistics over time, e.g., the ex...

Please sign up or login with your details

Forgot password? Click here to reset