Activation Density based Mixed-Precision Quantization for Energy Efficient Neural Networks

01/12/2021
by   Karina Vasquez, et al.
12

As neural networks gain widespread adoption in embedded devices, there is a need for model compression techniques to facilitate deployment in resource-constrained environments. Quantization is one of the go-to methods yielding state-of-the-art model compression. Most approaches take a fully trained model, apply different heuristics to determine the optimal bit-precision for different layers of the network, and retrain the network to regain any drop in accuracy. Based on Activation Density (AD)-the proportion of non-zero activations in a layer-we propose an in-training quantization method. Our method calculates bit-width for each layer during training yielding a mixed precision model with competitive accuracy. Since we train lower precision models during training, our approach yields the final quantized model at lower training complexity and also eliminates the need for re-training. We run experiments on benchmark datasets like CIFAR-10, CIFAR-100, TinyImagenet on VGG19/ResNet18 architectures and report the accuracy and energy estimates for the same. We achieve  4.5x benefit in terms of estimated multiply-and-accumulate (MAC) reduction while reducing the training complexity by 50 proposed method, we develop a mixed-precision scalable Process In Memory (PIM) hardware accelerator platform. The hardware platform incorporates shift-add functionality for handling multi-bit precision neural network models. Evaluating the quantized models obtained with our proposed method on the PIM platform yields  5x energy reduction compared to 16-bit models. Additionally, we find that integrating AD based quantization with AD based pruning (both conducted during training) yields up to  198x and  44x energy reductions for VGG19 and ResNet18 architectures respectively on PIM platform compared to baseline 16-bit precision, unpruned models.

READ FULL TEXT
research
10/31/2022

Block-Wise Dynamic-Precision Neural Network Training Acceleration via Online Quantization Sensitivity Analytics

Data quantization is an effective method to accelerate neural network tr...
research
12/29/2019

Mixed-Precision Quantized Neural Network with Progressively Decreasing Bitwidth For Image Classification and Object Detection

Efficient model inference is an important and practical issue in the dep...
research
12/06/2022

CSQ: Growing Mixed-Precision Quantization Scheme with Bi-level Continuous Sparsification

Mixed-precision quantization has been widely applied on deep neural netw...
research
06/14/2021

Neuroevolution-Enhanced Multi-Objective Optimization for Mixed-Precision Quantization

Mixed-precision quantization is a powerful tool to enable memory and com...
research
07/16/2022

Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

Recently, deep convolutional neural networks (CNNs) have achieved many e...
research
11/04/2019

Ternary MobileNets via Per-Layer Hybrid Filter Banks

MobileNets family of computer vision neural networks have fueled tremend...
research
02/03/2018

Build a Compact Binary Neural Network through Bit-level Sensitivity and Data Pruning

Convolutional neural network (CNN) has been widely used for vision-based...

Please sign up or login with your details

Forgot password? Click here to reset