Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

07/20/2022
by   Daning Cheng, et al.
0

Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the majority of instances, the quantization model has a larger loss than a full precision model. This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model. In addition, the analysis demonstrates that, throughout the inference process, the loss function is mostly affected by the noise of the layer inputs. In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method. It is also difficult to improve the performance of these networks using quantization.

READ FULL TEXT

page 7

page 11

page 12

research
02/10/2022

Quantization in Layer's Input is Matter

In this paper, we will show that the quantization in layer's input is mo...
research
03/02/2023

Ternary Quantization: A Survey

Inference time, model size, and accuracy are critical for deploying deep...
research
01/30/2023

Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference

For effective and efficient deep neural network inference, it is desirab...
research
10/02/2018

ACIQ: Analytical Clipping for Integer Quantization of neural networks

Unlike traditional approaches that focus on the quantization at the netw...
research
02/18/2020

Robust Quantization: One Model to Rule Them All

Neural network quantization methods often involve simulating the quantiz...
research
09/19/2022

SAMP: A Toolkit for Model Inference with Self-Adaptive Mixed-Precision

The latest industrial inference engines, such as FasterTransformer1 and ...
research
06/13/2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

Data clipping is crucial in reducing noise in quantization operations an...

Please sign up or login with your details

Forgot password? Click here to reset