A Comprehensive Survey on Model Quantization for Deep Neural Networks

05/14/2022
by   Babak Rokh, et al.
0

Recent advances in machine learning by deep neural networks are significant. But using these networks has been accompanied by a huge number of parameters for storage and computations that leads to an increase in the hardware cost and posing challenges. Therefore, compression approaches have been proposed to design efficient accelerators. One important approach for deep neural network compression is quantization that full-precision values are stored in low bit-width. In this way, in addition to memory saving, the operations will be replaced by simple ones with low cost. Many methods are suggested for DNNs Quantization in recent years, because of flexibility and influence in designing efficient hardware. Therefore, an integrated report is essential for better understanding, analysis, and comparison. In this paper, we provide a comprehensive survey. We describe the quantization concepts and categorize the methods from different perspectives. We discuss using the scale factor to match the quantization levels with the distribution of the full-precision values and describe the clustering-based methods. For the first time, we review the training of a quantized deep neural network and using Straight-Through Estimator comprehensively. Also, we describe the simplicity of operations in quantized deep convolutional neural networks and explain the sensitivity of the different layers in quantization. Finally, we discuss the evaluation of the quantization methods and compare the accuracy of previous methods with various bit-width for weights and activations on CIFAR-10 and the large-scale dataset, ImageNet.

READ FULL TEXT

page 10

page 37

research
10/07/2019

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devi...
research
02/18/2019

Low-bit Quantization of Neural Networks for Efficient Inference

Recent breakthrough methods in machine learning make use of increasingly...
research
02/03/2018

Recent Advances in Efficient Computation of Deep Convolutional Neural Networks

Deep neural networks have evolved remarkably over the past few years and...
research
01/15/2023

RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs

In recent years, Convolutional Neural Networks (CNNs) have become the st...
research
09/27/2018

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks

Quantization of weights and activations in Deep Neural Networks (DNNs) i...
research
02/29/2020

Gradient-Based Deep Quantization of Neural Networks through Sinusoidal Adaptive Regularization

As deep neural networks make their ways into different domains, their co...

Please sign up or login with your details

Forgot password? Click here to reset