Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

09/08/2021
by   Cheng Gong, et al.
10

Quantization has been proven to be a vital method for improving the inference efficiency of deep neural networks (DNNs). However, it is still challenging to strike a good balance between accuracy and efficiency while quantizing DNN weights or activation values from high-precision formats to their quantized counterparts. We propose a new method called elastic significant bit quantization (ESB) that controls the number of significant bits of quantized values to obtain better inference accuracy with fewer resources. We design a unified mathematical formula to constrain the quantized values of the ESB with a flexible number of significant bits. We also introduce a distribution difference aligner (DDA) to quantitatively align the distributions between the full-precision weight or activation values and quantized values. Consequently, ESB is suitable for various bell-shaped distributions of weights and activation of DNNs, thus maintaining a high inference accuracy. Benefitting from fewer significant bits of quantized values, ESB can reduce the multiplication complexity. We implement ESB as an accelerator and quantitatively evaluate its efficiency on FPGAs. Extensive experimental results illustrate that ESB quantization consistently outperforms state-of-the-art methods and achieves average accuracy improvements of 4.78 ResNet18, and MobileNetV2, respectively. Furthermore, ESB as an accelerator can achieve 10.95 GOPS peak performance of 1k LUTs without DSPs on the Xilinx ZCU102 FPGA platform. Compared with CPU, GPU, and state-of-the-art accelerators on FPGAs, the ESB accelerator can improve the energy efficiency by up to 65x, 11x, and 26x, respectively.

READ FULL TEXT

page 2

page 5

page 7

page 8

page 9

page 11

page 13

page 15

research
05/17/2022

A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous Quantization

Parameter quantization in convolutional neural networks (CNNs) can help ...
research
07/13/2020

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

We present a novel technique, called Term Revealing (TR), for furthering...
research
09/05/2019

Training High-Performance and Large-Scale Deep Neural Networks with Full 8-bit Integers

Deep neural network (DNN) quantization converting floating-point (FP) da...
research
12/04/2019

RTN: Reparameterized Ternary Network

To deploy deep neural networks on resource-limited devices, quantization...
research
05/20/2020

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

The number of parameters in deep neural networks (DNNs) is rapidly incre...
research
08/29/2019

High Performance Scalable FPGA Accelerator for Deep Neural Networks

Low-precision is the first order knob for achieving higher Artificial In...
research
12/06/2022

QEBVerif: Quantization Error Bound Verification of Neural Networks

While deep neural networks (DNNs) have demonstrated impressive performan...

Please sign up or login with your details

Forgot password? Click here to reset