PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

10/15/2021
by   Zhihang Yuan, et al.
0

Network quantization is a powerful technique to compress convolutional neural networks. The quantization granularity determines how to share the scaling factors in weights, which affects the performance of network quantization. Most existing approaches share the scaling factors layerwisely or channelwisely for quantization of convolutional layers. Channelwise quantization and layerwise quantization have been widely used in various applications. However, other quantization granularities are rarely explored. In this paper, we will explore the sub-layerwise granularity that shares the scaling factor across multiple input and output channels. We propose an efficient post-training quantization method in sub-layerwise granularity (PTQ-SL). Then we systematically experiment on various granularities and observe that the prediction accuracy of the quantized neural network has a strong correlation with the granularity. Moreover, we find that adjusting the position of the channels can improve the performance of sub-layerwise quantization. Therefore, we propose a method to reorder the channels for sub-layerwise quantization. The experiments demonstrate that the sub-layerwise quantization with appropriate channel reordering can outperform the channelwise quantization.

READ FULL TEXT

page 1

page 4

research
11/24/2021

PTQ4ViT: Post-Training Quantization Framework for Vision Transformers

Quantization is one of the most effective methods to compress neural net...
research
12/14/2022

PD-Quant: Post-Training Quantization based on Prediction Difference Metric

As a neural network compression technique, post-training quantization (P...
research
02/04/2023

Oscillation-free Quantization for Low-bit Vision Transformers

Weight oscillation is an undesirable side effect of quantization-aware t...
research
01/17/2023

Deep Conditional Measure Quantization

The quantization of a (probability) measure is replacing it by a sum of ...
research
08/08/2023

Quantization Aware Factorization for Deep Neural Network Compression

Tensor decomposition of convolutional and fully-connected layers is an e...
research
02/10/2022

Quantune: Post-training Quantization of Convolutional Neural Networks using Extreme Gradient Boosting for Fast Deployment

To adopt convolutional neural networks (CNN) for a range of resource-con...
research
12/30/2021

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Quantized neural networks typically require smaller memory footprints an...

Please sign up or login with your details

Forgot password? Click here to reset