One Weight Bitwidth to Rule Them All

08/22/2020
by   Ting-Wu Chin, et al.
0

Weight quantization for deep ConvNets has shown promising results for applications such as image classification and semantic segmentation and is especially important for applications where memory storage is limited. However, when aiming for quantization without accuracy degradation, different tasks may end up with different bitwidths. This creates complexity for software and hardware support and the complexity accumulates when one considers mixed-precision quantization, in which case each layer's weights use a different bitwidth. Our key insight is that optimizing for the least bitwidth subject to no accuracy degradation is not necessarily an optimal strategy. This is because one cannot decide optimality between two bitwidths if one has a smaller model size while the other has better accuracy. In this work, we take the first step to understand if some weight bitwidth is better than others by aligning all to the same model size using a width-multiplier. Under this setting, somewhat surprisingly, we show that using a single bitwidth for the whole network can achieve better accuracy compared to mixed-precision quantization targeting zero accuracy degradation when both have the same model size. In particular, our results suggest that when the number of channels becomes a target hyperparameter, a single weight bitwidth throughout the network shows superior results for model compression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

We consider the post-training quantization problem, which discretizes th...
research
03/18/2021

Data-free mixed-precision quantization using novel sensitivity metric

Post-training quantization is a representative technique for compressing...
research
11/12/2020

Automated Model Compression by Jointly Applied Pruning and Quantization

In the traditional deep compression framework, iteratively performing ne...
research
11/19/2016

Quantized neural network design under weight capacity constraint

The complexity of deep neural network algorithms for hardware implementa...
research
12/21/2022

Automatic Network Adaptation for Ultra-Low Uniform-Precision Quantization

Uniform-precision neural network quantization has gained popularity sinc...
research
11/13/2019

DupNet: Towards Very Tiny Quantized CNN with Improved Accuracy for Face Detection

Deploying deep learning based face detectors on edge devices is a challe...
research
06/24/2023

Partitioning-Guided K-Means: Extreme Empty Cluster Resolution for Extreme Model Compression

Compactness in deep learning can be critical to a model's viability in l...

Please sign up or login with your details

Forgot password? Click here to reset