Fast Adjustable Threshold For Uniform Neural Network Quantization

12/19/2018
by   Alexander Goncharenko, et al.
8

Neural network quantization procedure is the necessary step for porting of neural networks to mobile devices. Quantization allows accelerating the inference, reducing memory consumption and model size. It can be performed without fine-tuning using calibration procedure (calculation of parameters necessary for quantization), or it is possible to train the network with quantization from scratch. Training with quantization from scratch on the labeled data is rather long and resource-consuming procedure. Quantization of network without fine-tuning leads to accuracy drop because of outliers which appear during the calibration. In this article we suggest to simplify the quantization procedure significantly by introducing the trained scale factors for quantization thresholds. It allows speeding up the process of quantization with fine-tuning up to 8 epochs as well as reducing the requirements to the set of train images. By our knowledge, the proposed method allowed us to get the first public available quantized version of MNAS without significant accuracy reduction - 74.8 are ready for use and available at: https://github.com/agoncharenko1992/FAT-fast_adjustable_threshold.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

Quadapter: Adapter for GPT-2 Quantization

Transformer language models such as GPT-2 are difficult to quantize beca...
research
11/13/2020

Filter Pre-Pruning for Improved Fine-tuning of Quantized Deep Neural Networks

Deep Neural Networks(DNNs) have many parameters and activation data, and...
research
06/13/2022

Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training

Data clipping is crucial in reducing noise in quantization operations an...
research
03/22/2021

n-hot: Efficient bit-level sparsity for powers-of-two neural network quantization

Powers-of-two (PoT) quantization reduces the number of bit operations of...
research
10/12/2018

Quantization for Rapid Deployment of Deep Neural Networks

This paper aims at rapid deployment of the state-of-the-art deep neural ...
research
09/11/2023

Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Large Language Models (LLMs) have proven their exceptional capabilities ...
research
04/22/2020

Up or Down? Adaptive Rounding for Post-Training Quantization

When quantizing neural networks, assigning each floating-point weight to...

Please sign up or login with your details

Forgot password? Click here to reset