A Closer Look at Hardware-Friendly Weight Quantization

10/07/2022
by   Sungmin Bae, et al.
0

Quantizing a Deep Neural Network (DNN) model to be used on a custom accelerator with efficient fixed-point hardware implementations, requires satisfying many stringent hardware-friendly quantization constraints to train the model. We evaluate the two main classes of hardware-friendly quantization methods in the context of weight quantization: the traditional Mean Squared Quantization Error (MSQE)-based methods and the more recent gradient-based methods. We study the two methods on MobileNetV1 and MobileNetV2 using multiple empirical metrics to identify the sources of performance differences between the two classes, namely, sensitivity to outliers and convergence instability of the quantizer scaling factor. Using those insights, we propose various techniques to improve the performance of both quantization methods - they fix the optimization instability issues present in the MSQE-based methods during quantization of MobileNet models and allow us to improve validation performance of the gradient-based methods by 4.0 on ImageNet respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2020

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Deep Neural Networks (DNNs) have achieved extraordinary performance in v...
research
12/01/2021

Hardware-friendly Deep Learning by Network Quantization and Binarization

Quantization is emerging as an efficient approach to promote hardware-fr...
research
07/20/2020

HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs

Recent work in network quantization produced state-of-the-art results us...
research
07/16/2019

Learning Multimodal Fixed-Point Weights using Gradient Descent

Due to their high computational complexity, deep neural networks are sti...
research
05/30/2023

Intriguing Properties of Quantization at Scale

Emergent properties have been widely adopted as a term to describe behav...
research
05/25/2023

Are We There Yet? Product Quantization and its Hardware Acceleration

Conventional multiply-accumulate (MAC) operations have long dominated co...
research
07/18/2023

PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization

Accurate yet efficient Deep Neural Networks (DNNs) are in high demand, e...

Please sign up or login with your details

Forgot password? Click here to reset