Effective and Fast: A Novel Sequential Single Path Search for Mixed-Precision Quantization

03/04/2021
by   Qigong Sun, et al.
0

Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance. However, it is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints (e.g., hardware resources, energy consumption, model size and computation latency). To address this issue, we propose a novel sequential single path search (SSPS) method for mixed-precision quantization,in which the given constraints are introduced into its loss function to guide searching process. A single path search cell is used to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform counterparts.

READ FULL TEXT
research
07/04/2020

FracBits: Mixed Precision Quantization via Fractional Bit-Widths

Model quantization helps to reduce model size and latency of deep neural...
research
02/20/2021

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Mixed-precision quantization can potentially achieve the optimal tradeof...
research
08/11/2022

Mixed-Precision Neural Networks: A Survey

Mixed-precision Deep Neural Networks achieve the energy efficiency and t...
research
07/20/2020

Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization

Emergent hardwares can support mixed precision CNN models inference that...
research
10/27/2022

Neural Networks with Quantization Constraints

Enabling low precision implementations of deep learning models, without ...
research
03/16/2022

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

The exponentially large discrete search space in mixed-precision quantiz...
research
07/06/2023

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

Mixed-precision quantization, where a deep neural network's layers are q...

Please sign up or login with your details

Forgot password? Click here to reset