All at Once Network Quantization via Collaborative Knowledge Transfer

03/02/2021
by   Ximeng Sun, et al.
0

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks on edge devices. While existing approaches offer impressive results on common benchmark datasets, they generally repeat the quantization process and retrain the low-precision network from scratch, leading to different networks tailored for different resource constraints. This limits scalable deployment of deep networks in many real-world applications, where in practice dynamic changes in bit-width are often desired. All at Once quantization addresses this problem, by flexibly adjusting the bit-width of a single deep network during inference, without requiring re-training or additional memory to store separate models, for instant adaptation in different scenarios. In this paper, we develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network. Specifically, we propose an adaptive selection strategy to choose a high-precision teacher for transferring knowledge to the low-precision student while jointly optimizing the model with all bit-widths. Furthermore, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network. Extensive experiments on several challenging and diverse datasets for both image and video classification well demonstrate the efficacy of our proposed approach over state-of-the-art methods.

READ FULL TEXT

page 3

page 7

research
03/06/2018

Deep Neural Network Compression with Single and Multiple Level Quantization

Network quantization is an effective solution to compress deep neural ne...
research
08/14/2019

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantizatio...
research
12/06/2018

DNQ: Dynamic Network Quantization

Network quantization is an effective method for the deployment of neural...
research
08/23/2021

Dynamic Network Quantization for Efficient Video Inference

Deep convolutional networks have recently achieved great success in vide...
research
10/31/2022

Block-Wise Dynamic-Precision Neural Network Training Acceleration via Online Quantization Sensitivity Analytics

Data quantization is an effective method to accelerate neural network tr...
research
02/18/2021

GradFreeBits: Gradient Free Bit Allocation for Dynamic Low Precision Neural Networks

Quantized neural networks (QNNs) are among the main approaches for deplo...
research
10/12/2018

Training Deep Neural Network in Limited Precision

Energy and resource efficient training of DNNs will greatly extend the a...

Please sign up or login with your details

Forgot password? Click here to reset