Differentiable Joint Pruning and Quantization for Hardware Efficiency

07/20/2020
by   Ying Wang, et al.
0

We present a differentiable joint pruning and quantization (DJPQ) scheme. We frame neural network compression as a joint gradient-based optimization problem, trading off between model pruning and quantization automatically for hardware efficiency. DJPQ incorporates variational information bottleneck based structured pruning and mixed-bit precision quantization into a single differentiable loss function. In contrast to previous works which consider pruning and quantization separately, our method enables users to find the optimal trade-off between both in a single training procedure. To utilize the method for more efficient hardware inference, we extend DJPQ to integrate structured pruning with power-of-two bit-restricted quantization. We show that DJPQ significantly reduces the number of Bit-Operations (BOPs) for several networks while maintaining the top-1 accuracy of original floating-point models (e.g., 53x BOPs reduction in ResNet18 on ImageNet, 43x in MobileNetV2). Compared to the conventional two-stage approach, which optimizes pruning and quantization independently, our scheme outperforms in terms of both accuracy and BOPs. Even when considering bit-restricted quantization, DJPQ achieves larger compression ratios and better accuracy than the two-stage approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2021

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeof...
research
02/15/2023

Towards Optimal Compression: Joint Pruning and Quantization

Compression of deep neural networks has become a necessary stage for opt...
research
01/13/2021

ABS: Automatic Bit Sharing for Model Compression

We present Automatic Bit Sharing (ABS) to automatically search for optim...
research
08/24/2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

We consider the problem of model compression for deep neural networks (D...
research
07/06/2023

Pruning vs Quantization: Which is Better?

Neural network pruning and quantization techniques are almost as old as ...
research
10/05/2020

Joint Pruning Quantization for Extremely Sparse Neural Networks

We investigate pruning and quantization for deep neural networks. Our go...
research
06/15/2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

We present APQ for efficient deep learning inference on resource-constra...

Please sign up or login with your details

Forgot password? Click here to reset