Automatic low-bit hybrid quantization of neural networks through meta learning

04/24/2020
by   PetsTime, et al.
4

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference, especially when deploying to edge or IoT devices with limited computation capacity and power consumption budget. The uniform bit width quantization across all the layers is usually sub-optimal and the exploration of hybrid quantization for different layers is vital for efficient deep compression. In this paper, we employ the meta learning method to automatically realize low-bit hybrid quantization of neural networks. A MetaQuantNet, together with a Quantization function, are trained to generate the quantized weights for the target DNN. Then, we apply a genetic algorithm to search the best hybrid quantization policy that meets compression constraints. With the best searched quantization policy, we subsequently retrain or finetune to further improve the performance of the quantized target network. Extensive experiments demonstrate the performance of searched hybrid quantization scheme surpass that of uniform bitwidth counterpart. Compared to the existing reinforcement learning (RL) based hybrid quantization search approach that relies on tedious explorations, our meta learning approach is more efficient and effective for any compression requirements since the MetaQuantNet only needs be trained once.

READ FULL TEXT
research
08/12/2020

Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers

The severe on-chip memory limitations are currently preventing the deplo...
research
02/09/2023

Data Quality-aware Mixed-precision Quantization via Hybrid Reinforcement Learning

Mixed-precision quantization mostly predetermines the model bit-width se...
research
10/11/2022

Deep learning model compression using network sensitivity and gradients

Deep learning model compression is an improving and important field for ...
research
07/20/2022

Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach

Deep neural network quantization with adaptive bitwidths has gained incr...
research
11/16/2021

Online Meta Adaptation for Variable-Rate Learned Image Compression

This work addresses two major issues of end-to-end learned image compres...
research
11/05/2018

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

Despite numerous state-of-the-art applications of Deep Neural Networks (...
research
08/11/2020

Hardware-Centric AutoML for Mixed-Precision Quantization

Model quantization is a widely used technique to compress and accelerate...

Please sign up or login with your details

Forgot password? Click here to reset