One Model for All Quantization: A Quantized Network Supporting Hot-Swap Bit-Width Adjustment

05/04/2021
by   Qigong Sun, et al.
0

As an effective technique to achieve the implementation of deep neural networks in edge devices, model quantization has been successfully applied in many practical applications. No matter the methods of quantization aware training (QAT) or post-training quantization (PTQ), they all depend on the target bit-widths. When the precision of quantization is adjusted, it is necessary to fine-tune the quantized model or minimize the quantization noise, which brings inconvenience in practical applications. In this work, we propose a method to train a model for all quantization that supports diverse bit-widths (e.g., form 8-bit to 1-bit) to satisfy the online quantization bit-width adjustment. It is hot-swappable that can provide specific quantization strategies for different candidates through multiscale quantization. We use wavelet decomposition and reconstruction to increase the diversity of weights, thus significantly improving the performance of each quantization candidate, especially at ultra-low bit-widths (e.g., 3-bit, 2-bit, and 1-bit). Experimental results on ImageNet and COCO show that our method can achieve accuracy comparable performance to dedicated models trained at the same precision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2021

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Quantizing deep neural networks is an effective method for reducing memo...
research
03/09/2021

MWQ: Multiscale Wavelet Quantized Neural Networks

Model quantization can reduce the model size and computational latency, ...
research
06/02/2022

NIPQ: Noise Injection Pseudo Quantization for Automated DNN Optimization

The optimization of neural networks in terms of computation cost and mem...
research
12/20/2019

AdaBits: Neural Network Quantization with Adaptive Bit-Widths

Deep neural networks with adaptive configurations have gained increasing...
research
07/18/2021

A High-Performance Adaptive Quantization Approach for Edge CNN Applications

Recent convolutional neural network (CNN) development continues to advan...
research
05/14/2020

Bayesian Bits: Unifying Quantization and Pruning

We introduce Bayesian Bits, a practical method for joint mixed precision...
research
12/10/2022

Vertical Layering of Quantized Neural Networks for Heterogeneous Inference

Although considerable progress has been obtained in neural network quant...

Please sign up or login with your details

Forgot password? Click here to reset