DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference

06/28/2023
by   Bahareh Khabbazan, et al.
0

Quantization is commonly used in Deep Neural Networks (DNNs) to reduce the storage and computational complexity by decreasing the arithmetical precision of activations and weights, a.k.a. tensors. Efficient hardware architectures employ linear quantization to enable the deployment of recent DNNs onto embedded systems and mobile devices. However, linear uniform quantization cannot usually reduce the numerical precision to less than 8 bits without sacrificing high performance in terms of model accuracy. The performance loss is due to the fact that tensors do not follow uniform distributions. In this paper, we show that a significant amount of tensors fit into an exponential distribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensors with an adaptive scheme that achieves the best trade-off between numerical precision and accuracy loss. The experimental results show that DNA-TEQ provides a much lower quantization bit-width compared to previous proposals, resulting in an average compression ratio of 40 with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQ leads the way in performing dot-product operations in the exponential domain, which saves 66

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2020

Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation

Quantization plays an important role for energy-efficient deployment of ...
research
09/01/2018

Learning Low Precision Deep Neural Networks through Regularization

We consider the quantization of deep neural networks (DNNs) to produce l...
research
12/18/2019

Adaptive Loss-aware Quantization for Multi-bit Networks

We investigate the compression of deep neural networks by quantizing the...
research
11/14/2022

Edge2Vec: A High Quality Embedding for the Jigsaw Puzzle Problem

Pairwise compatibility measure (CM) is a key component in solving the ji...
research
12/16/2018

Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge

Recently, deep neural networks (DNNs) have been widely applied in mobile...
research
09/09/2021

ECQ^x: Explainability-Driven Quantization for Low-Bit and Sparse DNNs

The remarkable success of deep neural networks (DNNs) in various applica...
research
12/18/2019

Neural Networks Weights Quantization: Target None-retraining Ternary (TNT)

Quantization of weights of deep neural networks (DNN) has proven to be a...

Please sign up or login with your details

Forgot password? Click here to reset