DeepAI AI Chat
Log In Sign Up

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

07/13/2020
by   H. T. Kung, et al.
Harvard University
0

We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of the two vectors in the dot product. By exploiting normal-like weight and data distributions typically present in DNNs, TR has a minimal impact on DNN model performance (i.e., accuracy or perplexity). We use TR to facilitate tightly synchronized processor arrays, such as systolic arrays, for efficient parallel processing. We show an FPGA implementation that can use a small number of control bits to switch between conventional quantization and TR-enabled quantization with a negligible delay. To enhance TR efficiency further, we propose HESE encoding (Hybrid Encoding for Signed Expressions) of values, as opposed to classic binary encoding with nonnegative power-of-two terms. We evaluate TR with HESE encoded values on an MLP for MNIST, multiple CNNs for ImageNet, and an LSTM for Wikitext-2, and show significant reductions in inference computations (between 3-10x) compared to conventional quantization for the same level of model performance.

READ FULL TEXT
09/08/2021

Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

Quantization has been proven to be a vital method for improving the infe...
05/20/2020

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

The number of parameters in deep neural networks (DNNs) is rapidly incre...
06/22/2017

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantized Neural Networks (QNNs), which use low bitwidth numbers for rep...
09/27/2018

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks

Quantization of weights and activations in Deep Neural Networks (DNNs) i...
09/09/2021

ECQ^x: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

The remarkable success of deep neural networks (DNNs) in various applica...
10/20/2016

Bit-pragmatic Deep Neural Network Computing

We quantify a source of ineffectual computations when processing the mul...