Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

12/30/2021
by   Runpei Dong, et al.
2

Quantized neural networks typically require smaller memory footprints and lower computation complexity, which is crucial for efficient deployment. However, quantization inevitably leads to a distribution divergence from the original network, which generally degrades the performance. To tackle this issue, massive efforts have been made, but most existing approaches lack statistical considerations and depend on several manual configurations. In this paper, we present an adaptive-mapping quantization method to learn an optimal latent sub-distribution that is inherent within models and smoothly approximated with a concrete Gaussian Mixture (GM). In particular, the network weights are projected in compliance with the GM-approximated sub-distribution. This sub-distribution evolves along with the weight update in a co-tuning schema guided by the direct task-objective optimization. Sufficient experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method. Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7.46× inference acceleration on an octa-core ARM CPU. Codes are publicly released at https://github.com/RunpeiDong/DGMS.

READ FULL TEXT

page 3

page 4

page 7

page 9

page 10

page 11

page 13

page 14

research
11/21/2019

Quantization Networks

Although deep neural networks are highly effective, their high computati...
research
06/11/2019

Data-Free Quantization through Weight Equalization and Bias Correction

We introduce a data-free quantization method for deep neural networks th...
research
10/09/2020

Once Quantized for All: Progressively Searching for Quantized Efficient Models

Automatic search of Quantized Neural Networks has attracted a lot of att...
research
02/10/2021

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

We study the challenging task of neural network quantization without end...
research
02/14/2022

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

Quantization of deep neural networks (DNN) has been proven effective for...
research
10/15/2021

PTQ-SL: Exploring the Sub-layerwise Post-training Quantization

Network quantization is a powerful technique to compress convolutional n...

Please sign up or login with your details

Forgot password? Click here to reset