GDRQ: Group-based Distribution Reshaping for Quantization

08/05/2019
by   Haibao Yu, et al.
0

Low-bit quantization is challenging to maintain high performance with limited model capacity (e.g., 4-bit for both weights and activations). Naturally, the distribution of both weights and activations in deep neural network are Gaussian-like. Nevertheless, due to the limited bitwidth of low-bit model, uniform-like distributed weights and activations have been proved to be more friendly to quantization while preserving accuracy Han2015Learning. Motivated by this, we propose Scale-Clip, a Distribution Reshaping technique that can reshape weights or activations into a uniform-like distribution in a dynamic manner. Furthermore, to increase the model capability for a low-bit model, a novel Group-based Quantization algorithm is proposed to split the filters into several groups. Different groups can learn different quantization parameters, which can be elegantly merged in to batch normalization layer without extra computational cost in the inference stage. Finally, we integrate Scale-Clip technique with Group-based Quantization algorithm and propose the Group-based Distribution Reshaping Quantization (GDQR) framework to further improve the quantization performance. Experiments on various networks (e.g. VGGNet and ResNet) and vision tasks (e.g. classification, detection and segmentation) demonstrate that our framework achieves good performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

Redistribution of Weights and Activations for AdderNet Quantization

Adder Neural Network (AdderNet) provides a new way for developing energy...
research
11/21/2019

Quantization Networks

Although deep neural networks are highly effective, their high computati...
research
10/16/2022

FIT: A Metric for Model Sensitivity

Model compression is vital to the deployment of deep learning on edge de...
research
11/29/2021

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

The nonuniform quantization strategy for compressing neural networks usu...
research
09/22/2019

Structured Binary Neural Networks for Image Recognition

We propose methods to train convolutional neural networks (CNNs) with bo...
research
07/13/2022

Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets

We propose a novel 2-stage sub 8-bit quantization aware training algorit...
research
02/08/2022

Binary Neural Networks as a general-propose compute paradigm for on-device computer vision

For binary neural networks (BNNs) to become the mainstream on-device com...

Please sign up or login with your details

Forgot password? Click here to reset