Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

06/13/2022
by   Charbel Sakr, et al.
0

Data clipping is crucial in reducing noise in quantization operations and improving the achievable accuracy of quantization-aware training (QAT). Current practices rely on heuristics to set clipping threshold scalars and cannot be shown to be optimal. We propose Optimally Clipped Tensors And Vectors (OCTAV), a recursive algorithm to determine MSE-optimal clipping scalars. Derived from the fast Newton-Raphson method, OCTAV finds optimal clipping scalars on the fly, for every tensor, at every iteration of the QAT routine. Thus, the QAT algorithm is formulated with provably minimum quantization noise at each step. In addition, we reveal limitations in common gradient estimation techniques in QAT and propose magnitude-aware differentiation as a remedy to further improve accuracy. Experimentally, OCTAV-enabled QAT achieves state-of-the-art accuracy on multiple tasks. These include training-from-scratch and retraining ResNets and MobileNets on ImageNet, and Squad fine-tuning using BERT models, where OCTAV-enabled QAT consistently preserves accuracy at low precision (4-to-6-bits). Our results require no modifications to the baseline training recipe, except for the insertion of quantization operations where appropriate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2022

SQuAT: Sharpness- and Quantization-Aware Training for BERT

Quantization is an effective technique to reduce memory footprint, infer...
research
12/19/2018

Fast Adjustable Threshold For Uniform Neural Network Quantization

Neural network quantization procedure is the necessary step for porting ...
research
10/02/2018

ACIQ: Analytical Clipping for Integer Quantization of neural networks

Unlike traditional approaches that focus on the quantization at the netw...
research
02/23/2023

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Pre-trained Transformer models such as BERT have shown great success in ...
research
07/20/2022

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

Based on the model's resilience to computational noise, model quantizati...
research
05/05/2021

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

Various post-training uniform quantization methods have usually been stu...
research
03/09/2023

Dynamic Stashing Quantization for Efficient Transformer Training

Large Language Models (LLMs) have demonstrated impressive performance on...

Please sign up or login with your details

Forgot password? Click here to reset