PalQuant: Accelerating High-precision Networks on Low-precision Accelerators

08/03/2022
by   Qinghao Hu, et al.
0

Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the low-precision quantized models on these DLAs bring in severe accuracy degradation. One way to achieve both high accuracy and efficient inference is to deploy high-precision neural networks on low-precision DLAs, which is rarely studied. In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. In addition, we present a novel cyclic shuffle module to boost the cross-group information communication between parallel low-precision groups. Extensive experiments demonstrate that PalQuant has superior performance to state-of-the-art quantization methods in both accuracy and inference speed, e.g., for ResNet-18 network quantization, PalQuant can obtain 0.52% higher accuracy and 1.78× speedup simultaneously over their 4-bit counter-part on a state-of-the-art 2-bit accelerator. Code is available at <https://github.com/huqinghao/PalQuant>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2021

Sharpness-aware Quantization for Deep Neural Networks

Network quantization is an effective compression method to reduce the mo...
research
04/20/2018

Value-aware Quantization for Training and Inference of Neural Networks

We propose a novel value-aware quantization which applies aggressively r...
research
08/05/2021

Generalizable Mixed-Precision Quantization via Attribution Rank Preservation

In this paper, we propose a generalizable mixed-precision quantization (...
research
07/01/2018

SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks

Inference for state-of-the-art deep neural networks is computationally e...
research
10/13/2020

Revisiting BFloat16 Training

State-of-the-art generic low-precision training algorithms use a mix of ...
research
05/28/2019

Progressive Learning of Low-Precision Networks

Recent years have witnessed the great advance of deep learning in a vari...
research
08/25/2021

Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

Deep neural networks (DNN) have shown superior performance in a variety ...

Please sign up or login with your details

Forgot password? Click here to reset