Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

05/05/2021
by   Byeongwook Kim, et al.
0

Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy. In this paper, we propose a new post-training uniform quantization technique considering non-convexity. We empirically show that hyper-parameters for clipping and rounding of weights and activations can be explored by monitoring task loss. Then, an optimally searched set of hyper-parameters is frozen to proceed to the next layer such that an incremental non-convex optimization is enabled for post-training quantization. Throughout extensive experimental results using various models, our proposed technique presents higher model accuracy, especially for a low-bit quantization.

READ FULL TEXT

page 6

page 11

research
06/30/2020

EasyQuant: Post-training Quantization via Scale Optimization

The 8 bits quantization has been widely applied to accelerate network in...
research
11/05/2019

Post-Training 4-bit Quantization on Embedding Tables

Continuous representations have been widely adopted in recommender syste...
research
05/04/2021

Training Quantized Neural Networks to Global Optimality via Semidefinite Programming

Neural networks (NNs) have been extremely successful across many tasks i...
research
05/29/2023

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models

Several post-training quantization methods have been applied to large la...
research
11/17/2019

Loss Aware Post-training Quantization

Neural network quantization enables the deployment of large models on re...
research
05/08/2022

Efficient Representation of Large-Alphabet Probability Distributions

A number of engineering and scientific problems require representing and...
research
06/13/2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

Data clipping is crucial in reducing noise in quantization operations an...

Please sign up or login with your details

Forgot password? Click here to reset