Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms

02/03/2021
by   Rishabh Goyal, et al.
9

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them unfeasible for execution on embedded platforms which are constrained on physical resources needed to support CNNs. Quantization has often been used to efficiently optimize CNNs for memory and computational complexity at the cost of a loss of prediction accuracy. We therefore propose a method to optimally quantize the weights, biases and activations of each layer of a pre-trained CNN while controlling the loss in inference accuracy to enable quantized inference. We quantize the 32-bit floating-point precision parameters to low bitwidth fixed-point representations thereby finding optimal bitwidths and fractional offsets for parameters of each layer of a given CNN. We quantize parameters of a CNN post-training without re-training it. Our method is designed to quantize parameters of a CNN taking into account how other parameters are quantized because ignoring quantization errors due to other quantized parameters leads to a low precision CNN with accuracy losses of up to 50 low precision CNN with accuracy losses of less than 1 used by commercial tools that quantize all parameters to 8-bits, our approach provides quantized CNN with averages of 53 lower cost of executing multiplications for the two CNNs trained on the four datasets that we tested our work on. We find that layer-wise quantization of parameters significantly helps in this process.

READ FULL TEXT

page 8

page 17

page 20

page 25

page 26

page 27

page 29

page 33

research
06/21/2018

Quantizing deep convolutional networks for efficient inference: A whitepaper

We present an overview of techniques for quantizing convolutional neural...
research
05/17/2022

A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous Quantization

Parameter quantization in convolutional neural networks (CNNs) can help ...
research
07/23/2020

Efficient Residue Number System Based Winograd Convolution

Prior research has shown that Winograd algorithm can reduce the computat...
research
10/15/2019

Optimizing Convolutional Neural Networks for Embedded Systems by Means of Neuroevolution

Automated design methods for convolutional neural networks (CNNs) have r...
research
02/10/2017

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

This paper presents incremental network quantization (INQ), a novel meth...
research
08/26/2022

GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks

Deep convolutional neural network (CNN) training via iterative optimizat...
research
08/31/2021

Quantized convolutional neural networks through the lens of partial differential equations

Quantization of Convolutional Neural Networks (CNNs) is a common approac...

Please sign up or login with your details

Forgot password? Click here to reset