DNN Quantization with Attention

03/24/2021
by   Ghouthi Boukli Hacene, et al.
0

Low-bit quantization of network weights and activations can drastically reduce the memory footprint, complexity, energy consumption and latency of Deep Neural Networks (DNNs). However, low-bit quantization can also cause a considerable drop in accuracy, in particular when we apply it to complex learning tasks or lightweight DNN architectures. In this paper, we propose a training procedure that relaxes the low-bit quantization. We call this procedure DNN Quantization with Attention (DQA). The relaxation is achieved by using a learnable linear combination of high, medium and low-bit quantizations. Our learning procedure converges step by step to a low-bit quantization using an attention mechanism with temperature scheduling. In experiments, our approach outperforms other low-bit quantization techniques on various object recognition benchmarks such as CIFAR10, CIFAR100 and ImageNet ILSVRC 2012, achieves almost the same accuracy as a full precision DNN, and considerably reduces the accuracy drop when quantizing lightweight DNN architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2020

Bit Error Robustness for Energy-Efficient DNN Accelerators

Deep neural network (DNN) accelerators received considerable attention i...
research
01/04/2019

Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

This paper addresses a challenging problem - how to reduce energy consum...
research
08/03/2017

Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization

Low-bit deep neural networks (DNNs) become critical for embedded applica...
research
11/01/2019

Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters

Effective employment of deep neural networks (DNNs) in mobile devices an...
research
03/22/2021

n-hot: Efficient bit-level sparsity for powers-of-two neural network quantization

Powers-of-two (PoT) quantization reduces the number of bit operations of...
research
02/17/2020

Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations

We propose precision gating (PG), an end-to-end trainable dynamic dual-p...
research
11/05/2018

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

Despite numerous state-of-the-art applications of Deep Neural Networks (...

Please sign up or login with your details

Forgot password? Click here to reset