One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

11/01/2021
by   Bingqian Lu, et al.
16

Convolutional neural networks (CNNs) are used in numerous real-world applications such as vision-based autonomous driving and video content analysis. To run CNN inference on various target devices, hardware-aware neural architecture search (NAS) is crucial. A key requirement of efficient hardware-aware NAS is the fast evaluation of inference latencies in order to rank different architectures. While building a latency predictor for each target device has been commonly used in state of the art, this is a very time-consuming process, lacking scalability in the presence of extremely diverse devices. In this work, we address the scalability challenge by exploiting latency monotonicity – the architecture latency rankings on different devices are often correlated. When strong latency monotonicity exists, we can re-use architectures searched for one proxy device on new target devices, without losing optimality. In the absence of strong latency monotonicity, we propose an efficient proxy adaptation technique to significantly boost the latency monotonicity. Finally, we validate our approach and conduct experiments with devices of different platforms on multiple mainstream search spaces, including MobileNet-V2, MobileNet-V3, NAS-Bench-201, ProxylessNAS and FBNet. Our results highlight that, by using just one proxy device, we can find almost the same Pareto-optimal architectures as the existing per-device NAS, while avoiding the prohibitive cost of building a latency predictor for each device. GitHub: https://github.com/Ren-Research/OneProxy

READ FULL TEXT

page 10

page 19

page 26

page 28

page 30

page 31

page 32

page 34

research
06/16/2021

HELP: Hardware-Adaptive Efficient Latency Predictor for NAS via Meta-Learning

For deployment, neural architecture search should be hardware-aware, in ...
research
08/07/2020

A Surgery of the Neural Architecture Evaluators

Neural architecture search (NAS) recently received extensive attention d...
research
04/27/2022

MAPLE-Edge: A Runtime Latency Predictor for Edge Devices

Neural Architecture Search (NAS) has enabled automatic discovery of more...
research
05/25/2022

MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge

Deep neural network (DNN) latency characterization is a time-consuming p...
research
02/27/2021

Pareto-Frontier-aware Neural Architecture Generation for Diverse Budgets

Designing feasible and effective architectures under diverse computation...
research
04/09/2022

Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

On-device ML accelerators are becoming a standard in modern mobile syste...
research
10/06/2020

LETI: Latency Estimation Tool and Investigation of Neural Networks inference on Mobile GPU

A lot of deep learning applications are desired to be run on mobile devi...

Please sign up or login with your details

Forgot password? Click here to reset