Inference Latency Prediction at the Edge

10/06/2022
by   Zhuojin Li, et al.
0

With the growing workload of inference tasks on mobile devices, state-of-the-art neural architectures (NAs) are typically designed through Neural Architecture Search (NAS) to identify NAs with good tradeoffs between accuracy and efficiency (e.g., latency). Since measuring the latency of a huge set of candidate architectures during NAS is not scalable, approaches are needed for predicting end-to-end inference latency on mobile devices. Such predictions are challenging due to hardware heterogeneity, optimizations applied by ML frameworks, and the diversity of neural architectures. Motivated by these challenges, in this paper, we first quantitatively assess characteristics of neural architectures and mobile devices that have significant effects on inference latency. Based on this assessment, we propose a latency prediction framework which addresses these challenges by developing operation-wise latency predictors, under a variety of settings and a number of hardware devices, with multi-core CPUs and GPUs, achieving high accuracy in end-to-end latency prediction, as shown by our comprehensive evaluations. To illustrate that our approach does not require expensive data collection, we also show that accurate predictions can be achieved on real-world NAs using only small amounts of profiling data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2020

BRP-NAS: Prediction-based NAS using GCNs

Neural architecture search (NAS) enables researchers to automatically ex...
research
06/04/2023

Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search

Many hardware-aware neural architecture search (NAS) methods have been d...
research
08/29/2018

Searching Toward Pareto-Optimal Device-Aware Neural Architectures

Recent breakthroughs in Neural Architectural Search (NAS) have achieved ...
research
10/06/2020

LETI: Latency Estimation Tool and Investigation of Neural Networks inference on Mobile GPU

A lot of deep learning applications are desired to be run on mobile devi...
research
07/20/2022

EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching

Recent advanced studies have spent considerable human efforts on optimiz...
research
11/13/2020

Reducing Inference Latency with Concurrent Architectures for Image Recognition

Satisfying the high computation demand of modern deep learning architect...
research
05/12/2023

Monitoring and Adapting ML Models on Mobile Devices

ML models are increasingly being pushed to mobile devices, for low-laten...

Please sign up or login with your details

Forgot password? Click here to reset