A Note on Latency Variability of Deep Neural Networks for Mobile Inference

02/29/2020
by   Luting Yang, et al.
0

Running deep neural network (DNN) inference on mobile devices, i.e., mobile inference, has become a growing trend, making inference less dependent on network connections and keeping private data locally. The prior studies on optimizing DNNs for mobile inference typically focus on the metric of average inference latency, thus implicitly assuming that mobile inference exhibits little latency variability. In this note, we conduct a preliminary measurement study on the latency variability of DNNs for mobile inference. We show that the inference latency variability can become quite significant in the presence of CPU resource contention. More interestingly, unlike the common belief that the relative performance superiority of DNNs on one device can carry over to another device and/or another level of resource contention, we highlight that a DNN model with a better latency performance than another model can become outperformed by the other model when resource contention be more severe or running on another device. Thus, when optimizing DNN models for mobile inference, only measuring the average latency may not be adequate; instead, latency variability under various conditions should be accounted for, including but not limited to different devices and different levels of CPU resource contention considered in this note.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2020

Scaling Up Deep Neural Network Optimization for Edge Inference

Deep neural networks (DNNs) have been increasingly deployed on and integ...
research
03/08/2019

Improving Device-Edge Cooperative Inference of Deep Learning via 2-Step Pruning

Deep neural networks (DNNs) are state-of-the-art solutions for many mach...
research
02/16/2020

MDInference: Balancing Inference Accuracy and Latency for Mobile Applications

Deep Neural Networks (DNNs) are allowing mobile devices to incorporate a...
research
02/16/2020

MDInference: Balancing Inference Accuracy andLatency for Mobile Applications

Deep Neural Networks (DNNs) are allowing mobile devices to incorporate a...
research
11/09/2017

Stochastic Deep Learning in Memristive Networks

We study the performance of stochastically trained deep neural networks ...
research
01/28/2021

AdaSpring: Context-adaptive and Runtime-evolutionary Deep Model Compression for Mobile Applications

There are many deep learning (e.g., DNN) powered mobile and wearable app...
research
06/15/2022

Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices

DNNs are ubiquitous on edge devices nowadays. With its increasing import...

Please sign up or login with your details

Forgot password? Click here to reset