Once Quantized for All: Progressively Searching for Quantized Efficient Models

10/09/2020
by   Mingzhu Shen, et al.
0

Automatic search of Quantized Neural Networks has attracted a lot of attention. However, the existing quantization aware Neural Architecture Search (NAS) approaches inherit a two-stage search-retrain schema, which is not only time-consuming but also adversely affected by the unreliable ranking of architectures during the search. To avoid the undesirable effect of the search-retrain schema, we present Once Quantized for All (OQA), a novel framework that searches for quantized efficient models and deploys their quantized weights at the same time without additional post-process. While supporting a huge architecture search space, our OQA can produce a series of ultra-low bit-width(e.g. 4/3/2 bit) quantized efficient models. A progressive bit inheritance procedure is introduced to support ultra-low bit-width. Our discovered model family, OQANets, achieves a new state-of-the-art (SOTA) on quantized efficient models compared with various quantization methods and bit-widths. In particular, OQA2bit-L achieves 64.0 outperforming its 2-bit counterpart EfficientNet-B0@QKD by a large margin of 14 https://github.com/LaVieEnRoseSMZ/OQA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2018

Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search

Recent work in network quantization has substantially reduced the time a...
research
09/25/2022

Bigger Faster: Two-stage Neural Architecture Search for Quantized Transformer Models

Neural architecture search (NAS) for transformers has been used to creat...
research
05/19/2021

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

As the applications of deep learning models on edge devices increase at ...
research
12/30/2021

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Quantized neural networks typically require smaller memory footprints an...
research
08/26/2022

GHN-Q: Parameter Prediction for Unseen Quantized Convolutional Architectures via Graph Hypernetworks

Deep convolutional neural network (CNN) training via iterative optimizat...
research
12/20/2019

AdaBits: Neural Network Quantization with Adaptive Bit-Widths

Deep neural networks with adaptive configurations have gained increasing...
research
05/07/2021

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Quantization has become a popular technique to compress neural networks ...

Please sign up or login with your details

Forgot password? Click here to reset