Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation

01/31/2020
by   Jun Fang, et al.
0

Quantization plays an important role for energy-efficient deployment of deep neural networks (DNNs) on resource-limited devices. Post-training quantization is crucial since it does not require retraining or accessibility to the full training dataset. The conventional post-training uniform quantization scheme achieves satisfactory results by converting DNNs from full-precision to 8-bit integers, however, it suffers from significant performance degradation when quantizing to lower precision such as 4 bits. In this paper, we propose a piecewise linear quantization method to enable accurate post-training quantization. Inspired from the fact that the weight tensors have bell-shaped distributions with long tails, our approach breaks the entire quantization range into two non-overlapping regions for each tensor, with each region being assigned an equal number of quantization levels. The optimal break-point that divides the entire range is found by minimizing the quantization error. Extensive results show that the proposed method achieves state-of-the-art performance on image classification, semantic segmentation and object detection. It is possible to quantize weights to 4 bits without retraining while nearly maintaining the performance of the original full-precision model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devi...
research
06/28/2023

DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference

Quantization is commonly used in Deep Neural Networks (DNNs) to reduce t...
research
06/11/2019

Data-Free Quantization through Weight Equalization and Bias Correction

We introduce a data-free quantization method for deep neural networks th...
research
06/30/2023

Designing strong baselines for ternary neural network quantization through support and mass equalization

Deep neural networks (DNNs) offer the highest performance in a wide rang...
research
11/16/2018

Composite Binary Decomposition Networks

Binary neural networks have great resource and computing efficiency, whi...
research
05/23/2021

Post-Training Sparsity-Aware Quantization

Quantization is a technique used in deep neural networks (DNNs) to incre...

Please sign up or login with your details

Forgot password? Click here to reset