Weight Equalizing Shift Scaler-Coupled Post-training Quantization

08/13/2020
by   Jihun Oh, et al.
0

Post-training, layer-wise quantization is preferable because it is free from retraining and is hardware-friendly. Nevertheless, accuracy degradation has occurred when a neural network model has a big difference of per-out-channel weight ranges. In particular, the MobileNet family has a tragedy drop in top-1 accuracy from 70.60 weight quantization. To mitigate this significant accuracy reduction, we propose a new weight equalizing shift scaler, i.e. rescaling the weight range per channel by a 4-bit binary shift, prior to a layer-wise quantization. To recover the original output range, inverse binary shifting is efficiently fused to the existing per-layer scale compounding in the fixed-computing convolutional operator of the custom neural processing unit. The binary shift is a key feature of our algorithm, which significantly improved the accuracy performance without impeding the memory footprint. As a result, our proposed method achieved a top-1 accuracy of 69.78 robust performance in varying network models and tasks, which is competitive to channel-wise quantization results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization

We introduce a Power-of-Two post-training quantization( PTQ) method for ...
research
12/15/2020

Exploring Neural Networks Quantization via Layer-Wise Quantization Analysis

Quantization is an essential step in the efficient deployment of deep le...
research
09/12/2019

A Channel-Pruned and Weight-Binarized Convolutional Neural Network for Keyword Spotting

We study channel number reduction in combination with weight binarizatio...
research
11/04/2019

Ternary MobileNets via Per-Layer Hybrid Filter Banks

MobileNets family of computer vision neural networks have fueled tremend...
research
01/17/2022

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization (PTQ) method achievin...
research
04/22/2020

Up or Down? Adaptive Rounding for Post-Training Quantization

When quantizing neural networks, assigning each floating-point weight to...

Please sign up or login with your details

Forgot password? Click here to reset