Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

02/17/2020
by   Yichi Zhang, et al.
0

We propose precision gating (PG), an end-to-end trainable dynamic dual-precision quantization technique for deep neural networks. PG computes most features in a low precision and only a small proportion of important features in a higher precision to preserve accuracy. The proposed approach is applicable to a variety of DNN architectures and significantly reduces the computational cost of DNN execution with almost no accuracy loss. Our experiments indicate that PG achieves excellent results on CNNs, including statically compressed mobile-friendly networks such as ShuffleNet. Compared to the state-of-the-art prediction-based quantization schemes, PG achieves the same or higher accuracy with 2.4× less compute on ImageNet. PG furthermore applies to RNNs. Compared to 8-bit uniform quantization, PG obtains a 1.2 reduction on LSTM on the Penn Tree Bank dataset.

READ FULL TEXT
research
04/20/2018

Value-aware Quantization for Training and Inference of Neural Networks

We propose a novel value-aware quantization which applies aggressively r...
research
12/24/2018

Precision Highway for Ultra Low-Precision Quantization

Neural network quantization has an inherent problem called accumulated q...
research
03/24/2021

DNN Quantization with Attention

Low-bit quantization of network weights and activations can drastically ...
research
10/30/2021

ILMPQ : An Intra-Layer Multi-Precision Deep Neural Network Quantization framework for FPGA

This work targets the commonly used FPGA (field-programmable gate array)...
research
10/20/2017

Low Precision RNNs: Quantizing RNNs Without Losing Accuracy

Similar to convolution neural networks, recurrent neural networks (RNNs)...
research
07/15/2022

Low-bit Shift Network for End-to-End Spoken Language Understanding

Deep neural networks (DNN) have achieved impressive success in multiple ...
research
10/30/2021

RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions

This work proposes a novel Deep Neural Network (DNN) quantization framew...

Please sign up or login with your details

Forgot password? Click here to reset