Convolutional Neural Network Quantization using Generalized Gamma Distribution

10/31/2018
by   Doyun Kim, et al.
0

As edge applications using convolutional neural networks (CNN) models grow, it is becoming necessary to introduce dedicated hardware accelerators in which network parameters and feature-map data are represented with limited precision. In this paper we propose a novel quantization algorithm for energy-efficient deployment of the hardware accelerators. For weights and biases, the optimal bit length of the fractional part is determined so that the quantization error is minimized over their distribution. For feature-map data, meanwhile, their sample distribution is well approximated with the generalized gamma distribution (GGD), and accordingly the optimal quantization step size can be obtained through the asymptotical closed form solution of GGD. The proposed quantization algorithm has a higher signal-to-quantization-noise ratio (SQNR) than other quantization schemes previously proposed for CNNs, and even can be more improved by tuning the quantization parameters, resulting in efficient implementation of the hardware accelerators for CNNs in terms of power consumption and memory bandwidth.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2022

A Silicon Photonic Accelerator for Convolutional Neural Networks with Heterogeneous Quantization

Parameter quantization in convolutional neural networks (CNNs) can help ...
research
10/24/2020

MARS: Multi-macro Architecture SRAM CIM-Based Accelerator with Co-designed Compressed Neural Networks

Convolutional neural networks (CNNs) play a key role in deep learning ap...
research
01/25/2021

AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence

Convolutional neural networks (CNN) have been widely used for boosting t...
research
06/07/2017

ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks

In this paper we introduce ShiftCNN, a generalized low-precision archite...
research
03/08/2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning

Learning from the data stored in a database is an important function inc...
research
03/08/2019

Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning (Technical Report)

Learning from the data stored in a database is an important function inc...
research
02/18/2020

Robust Quantization: One Model to Rule Them All

Neural network quantization methods often involve simulating the quantiz...

Please sign up or login with your details

Forgot password? Click here to reset