BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

02/20/2021
by   Huanrui Yang, et al.
0

Mixed-precision quantization can potentially achieve the optimal tradeoff between performance and compression rate of deep neural networks, and thus, have been widely investigated. However, it lacks a systematic method to determine the exact quantization scheme. Previous methods either examine only a small manually-designed search space or utilize a cumbersome neural architecture search to explore the vast search space. These approaches cannot lead to an optimal quantization scheme efficiently. This work proposes bit-level sparsity quantization (BSQ) to tackle the mixed-precision quantization from a new angle of inducing bit-level sparsity. We consider each bit of quantized weights as an independent trainable variable and introduce a differentiable bit-sparsity regularizer. BSQ can induce all-zero bits across a group of weight elements and realize the dynamic precision reduction, leading to a mixed-precision quantization scheme of the original model. Our method enables the exploration of the full mixed-precision space with a single gradient-based optimization process, with only one hyperparameter to tradeoff the performance and compression. BSQ achieves both higher accuracy and higher bit reduction on various model architectures on the CIFAR-10 and ImageNet datasets comparing to previous methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2018

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

Recent work in network quantization has substantially reduced the time a...
research
12/06/2022

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

Mixed-precision quantization has been widely applied on deep neural netw...
research
10/20/2018

Differentiable Fine-grained Quantization for Deep Neural Network Compression

Neural networks have shown great performance in cognitive tasks. When de...
research
07/20/2020

Differentiable Joint Pruning and Quantization for Hardware Efficiency

We present a differentiable joint pruning and quantization (DJPQ) scheme...
research
08/11/2022

Mixed-Precision Neural Networks: A Survey

Mixed-precision Deep Neural Networks achieve the energy efficiency and t...
research
03/04/2021

Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization

Since model quantization helps to reduce the model size and computation ...
research
04/07/2023

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Exploring the expected quantizing scheme with suitable mixed-precision p...

Please sign up or login with your details

Forgot password? Click here to reset