BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

05/19/2021
by   Haoping Bai, et al.
0

As the applications of deep learning models on edge devices increase at an accelerating pace, fast adaptation to various scenarios with varying resource constraints has become a crucial aspect of model deployment. As a result, model optimization strategies with adaptive configuration are becoming increasingly popular. While single-shot quantized neural architecture search enjoys flexibility in both model architecture and quantization policy, the combined search space comes with many challenges, including instability when training the weight-sharing supernet and difficulty in navigating the exponentially growing search space. Existing methods tend to either limit the architecture search space to a small set of options or limit the quantization policy search space to fixed precision policies. To this end, we propose BatchQuant, a robust quantizer formulation that allows fast and stable training of a compact, single-shot, mixed-precision, weight-sharing supernet. We employ BatchQuant to train a compact supernet (offering over 10^76 quantized subnets) within substantially fewer GPU hours than previous methods. Our approach, Quantized-for-all (QFA), is the first to seamlessly extend one-shot weight-sharing NAS supernet to support subnets with arbitrary ultra-low bitwidth mixed-precision quantization policies without retraining. QFA opens up new possibilities in joint hardware-aware neural architecture search and quantization. We demonstrate the effectiveness of our method on ImageNet and achieve SOTA Top-1 accuracy under a low complexity constraint (<20 MFLOPs). The code and models will be made publicly available at https://github.com/bhpfelix/QFA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2018

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

Recent work in network quantization has substantially reduced the time a...
research
10/16/2022

HQNAS: Auto CNN deployment framework for joint quantization and architecture search

Deep learning applications are being transferred from the cloud to edge ...
research
03/15/2023

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

The combination of Neural Architecture Search (NAS) and quantization has...
research
06/15/2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

We present APQ for efficient deep learning inference on resource-constra...
research
10/09/2020

Once Quantized for All: Progressively Searching for Quantized Efficient Models

Automatic search of Quantized Neural Networks has attracted a lot of att...
research
07/10/2023

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

Quantizing neural networks is one of the most effective methods for achi...
research
04/07/2023

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

Exploring the expected quantizing scheme with suitable mixed-precision p...

Please sign up or login with your details

Forgot password? Click here to reset