HQNAS: Auto CNN deployment framework for joint quantization and architecture search

10/16/2022
by   Hongjiang Chen, et al.
0

Deep learning applications are being transferred from the cloud to edge with the rapid development of embedded computing systems. In order to achieve higher energy efficiency with the limited resource budget, neural networks(NNs) must be carefully designed in two steps, the architecture design and the quantization policy choice. Neural Architecture Search(NAS) and Quantization have been proposed separately when deploying NNs onto embedded devices. However, taking the two steps individually is time-consuming and leads to a sub-optimal final deployment. To this end, we propose a novel neural network design framework called Hardware-aware Quantized Neural Architecture Search(HQNAS) framework which combines the NAS and Quantization together in a very efficient manner using weight-sharing and bit-sharing. It takes only 4 GPU hours to discover an outstanding NN policy on CIFAR10. It also takes only GPU time to generate a comparable model on Imagenet compared to the traditional NAS method with 1.8x decrease of latency and a negligible accuracy loss of only 0.7 neural network needs to evolve occasionally due to changes of local data, environment and user preference.

READ FULL TEXT
research
05/19/2021

BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer

As the applications of deep learning models on edge devices increase at ...
research
06/15/2020

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

We present APQ for efficient deep learning inference on resource-constra...
research
03/20/2020

FTT-NAS: Discovering Fault-Tolerant Neural Architecture

With the fast evolvement of embedded deep-learning computing systems, ap...
research
07/29/2022

Evaluating the Practicality of Learned Image Compression

Learned image compression has achieved extraordinary rate-distortion per...
research
03/13/2023

AdaptiveNet: Post-deployment Neural Architecture Adaptation for Diverse Edge Environments

Deep learning models are increasingly deployed to edge devices for real-...
research
03/04/2022

Improving the Energy Efficiency and Robustness of tinyML Computer Vision using Log-Gradient Input Images

This paper studies the merits of applying log-gradient input images to c...
research
09/25/2022

Bigger Faster: Two-stage Neural Architecture Search for Quantized Transformer Models

Neural architecture search (NAS) for transformers has been used to creat...

Please sign up or login with your details

Forgot password? Click here to reset