Rethinking Differentiable Search for Mixed-Precision Neural Networks

04/13/2020
by   Zhaowei Cai, et al.
0

Low-precision networks, with weights and activations quantized to low bit-width, are widely used to accelerate inference on edge devices. However, current solutions are uniform, using identical bit-width for all filters. This fails to account for the different sensitivities of different filters and is suboptimal. Mixed-precision networks address this problem, by tuning the bit-width to individual filter requirements. In this work, the problem of optimal mixed-precision network search (MPS) is considered. To circumvent its difficulties of discrete search space and combinatorial optimization, a new differentiable search architecture is proposed, with several novel contributions to advance the efficiency by leveraging the unique properties of the MPS problem. The resulting Efficient differentiable MIxed-Precision network Search (EdMIPS) method is effective at finding the optimal bit allocation for multiple popular networks, and can search a large model, e.g. Inception-V3, directly on ImageNet without proxy task in a reasonable amount of time. The learned mixed-precision networks significantly outperform their uniform counterparts.

READ FULL TEXT
research
11/30/2018

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

Recent work in network quantization has substantially reduced the time a...
research
07/10/2023

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Quantizing neural networks is one of the most effective methods for achi...
research
07/16/2022

Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

Recently, deep convolutional neural networks (CNNs) have achieved many e...
research
03/17/2020

Efficient Bitwidth Search for Practical Mixed Precision Neural Network

Network quantization has rapidly become one of the most widely used meth...
research
06/04/2022

Combinatorial optimization for low bit-width neural networks

Low-bit width neural networks have been extensively explored for deploym...
research
02/14/2023

Searching Transferable Mixed-Precision Quantization Policy through Large Margin Regularization

Mixed-precision quantization (MPQ) suffers from time-consuming policy se...
research
06/17/2022

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Quantization is widely employed in both cloud and edge systems to reduce...

Please sign up or login with your details

Forgot password? Click here to reset