MultiQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

05/14/2023
by   Yunshan Zhong, et al.
0

Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by frequent bit-width switching of weights and activations, leading to limited performance. To address this issue, we propose MultiQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MultiQuant duplicates the network body into multiple independent branches and quantizes the weights of each branch to a fixed 2-bit while retaining the input activations in the expected bit-width. This approach maintains the computational cost as the same while avoiding the switching of weight bit-widths, thereby substantially reducing errors in weight quantization. Additionally, we introduce an amortization branch selection strategy to distribute quantization errors caused by activation bit-width switching among branches to enhance performance. Finally, we design an in-place distillation strategy that facilitates guidance between branches to further enhance MultiQuant's performance. Extensive experiments demonstrate that MultiQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is at <https://github.com/zysxmu/MultiQuant>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2018

DNQ: Dynamic Network Quantization

Network quantization is an effective method for the deployment of neural...
research
03/06/2018

Deep Neural Network Compression with Single and Multiple Level Quantization

Network quantization is an effective solution to compress deep neural ne...
research
08/15/2023

EQ-Net: Elastic Quantization Neural Networks

Current model quantization methods have shown their promising capability...
research
01/15/2023

RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of Quantized CNNs

In recent years, Convolutional Neural Networks (CNNs) have become the st...
research
04/14/2021

End-to-end Keyword Spotting using Neural Architecture Search and Quantization

This paper introduces neural architecture search (NAS) for the automatic...
research
04/21/2022

Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach

Conventional model quantization methods use a fixed quantization scheme ...
research
08/26/2019

Verifying Bit-vector Invertibility Conditions in Coq (Extended Abstract)

This work is a part of an ongoing effort to prove the correctness of inv...

Please sign up or login with your details

Forgot password? Click here to reset