DeepAI AI Chat
Log In Sign Up

Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

by   H. T. Kung, et al.
Harvard University

We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of the two vectors in the dot product. By exploiting normal-like weight and data distributions typically present in DNNs, TR has a minimal impact on DNN model performance (i.e., accuracy or perplexity). We use TR to facilitate tightly synchronized processor arrays, such as systolic arrays, for efficient parallel processing. We show an FPGA implementation that can use a small number of control bits to switch between conventional quantization and TR-enabled quantization with a negligible delay. To enhance TR efficiency further, we propose HESE encoding (Hybrid Encoding for Signed Expressions) of values, as opposed to classic binary encoding with nonnegative power-of-two terms. We evaluate TR with HESE encoded values on an MLP for MNIST, multiple CNNs for ImageNet, and an LSTM for Wikitext-2, and show significant reductions in inference computations (between 3-10x) compared to conventional quantization for the same level of model performance.


Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

Quantization has been proven to be a vital method for improving the infe...

BiQGEMM: Matrix Multiplication with Lookup Table For Binary-Coding-based Quantized DNNs

The number of parameters in deep neural networks (DNNs) is rapidly incre...

Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks

Quantized Neural Networks (QNNs), which use low bitwidth numbers for rep...

Scalar Arithmetic Multiple Data: Customizable Precision for Deep Neural Networks

Quantization of weights and activations in Deep Neural Networks (DNNs) i...

ECQ^x: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

The remarkable success of deep neural networks (DNNs) in various applica...

Bit-pragmatic Deep Neural Network Computing

We quantify a source of ineffectual computations when processing the mul...