Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

08/14/2019
by   Ruihao Gong, et al.
4

Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training low-bit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7× speed up, compared with the open-source 8-bit high-performance inference framework NCNN. [31]

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2019

Bit Efficient Quantization for Deep Neural Networks

Quantization for deep neural networks have afforded models for edge devi...
research
11/21/2019

Quantization Networks

Although deep neural networks are highly effective, their high computati...
research
12/19/2021

Logarithmic Unbiased Quantization: Practical 4-bit Training in Deep Learning

Quantization of the weights and activations is one of the main methods t...
research
06/04/2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution

Model quantization is challenging due to many tedious hyper-parameters s...
research
12/20/2018

SQuantizer: Simultaneous Learning for Both Sparse and Low-precision Neural Networks

Deep neural networks have achieved state-of-the-art accuracies in a wide...
research
03/03/2022

ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM

We accelerate the 4-bit product quantization (PQ) on the ARM architectur...
research
03/02/2021

All at Once Network Quantization via Collaborative Knowledge Transfer

Network quantization has rapidly become one of the most widely used meth...

Please sign up or login with your details

Forgot password? Click here to reset