Efficient Bitwidth Search for Practical Mixed Precision Neural Network

03/17/2020
by   Yuhang Li, et al.
14

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks. Recent efforts propose to quantize weights and activations from different layers with different precision to improve the overall performance. However, it is challenging to find the optimal bitwidth (i.e., precision) for weights and activations of each layer efficiently. Meanwhile, it is yet unclear how to perform convolution for weights and activations of different precision efficiently on generic hardware platforms. To resolve these two issues, in this paper, we first propose an Efficient Bitwidth Search (EBS) algorithm, which reuses the meta weights for different quantization bitwidth and thus the strength for each candidate precision can be optimized directly w.r.t the objective without superfluous copies, reducing both the memory and computational cost significantly. Second, we propose a binary decomposition algorithm that converts weights and activations of different precision into binary matrices to make the mixed precision convolution efficient and practical. Experiment results on CIFAR10 and ImageNet datasets demonstrate our mixed precision QNN outperforms the handcrafted uniform bitwidth counterparts and other mixed precision techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/19/2019

FQ-Conv: Fully Quantized Convolution for Efficient and Accurate Inference

Deep neural networks (DNNs) can be made hardware-efficient by reducing t...
research
10/13/2021

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Quantization is a widely used technique to compress and accelerate deep ...
research
04/13/2020

Rethinking Differentiable Search for Mixed-Precision Neural Networks

Low-precision networks, with weights and activations quantized to low bi...
research
03/16/2021

Training Dynamical Binary Neural Networks with Equilibrium Propagation

Equilibrium Propagation (EP) is an algorithm intrinsically adapted to th...
research
08/19/2020

Channel-wise Hessian Aware trace-Weighted Quantization of Neural Networks

Second-order information has proven to be very effective in determining ...
research
06/04/2019

PCA-driven Hybrid network design for enabling Intelligence at the Edge

The recent advent of IOT has increased the demand for enabling AI-based ...
research
11/01/2017

Attacking Binarized Neural Networks

Neural networks with low-precision weights and activations offer compell...

Please sign up or login with your details

Forgot password? Click here to reset