ModiPick: SLA-aware Accuracy Optimization For Mobile Deep Inference

09/04/2019
by   Samuel S. Ogden, et al.
0

Mobile applications are increasingly leveraging complex deep learning models to deliver features, e.g., image recognition, that require high prediction accuracy. Such models can be both computation and memory-intensive, even for newer mobile devices, and are therefore commonly hosted in powerful remote servers. However, current cloud-based inference services employ static model selection approach that can be suboptimal for satisfying application SLAs (service level agreements), as they fail to account for inherent dynamic mobile environment. We introduce a cloud-based technique called ModiPick that dynamically selects the most appropriate model for each inference request, and adapts its selection to match different SLAs and execution time budgets that are caused by variable mobile environments. The key idea of ModiPick is to make inference speed and accuracy trade-offs at runtime with a pool of managed deep learning models. As such, ModiPick masks unpredictable inference time budgets and therefore meets SLA targets, while improving accuracy within mobile network constraints. We evaluate ModiPick through experiments based on prototype systems and through simulations. We show that ModiPick achieves comparable inference accuracy to a greedy approach while improving SLA adherence by up to 88.5

READ FULL TEXT
research
09/10/2019

Characterizing the Deep Neural Networks Inference Performance of Mobile Applications

Today's mobile applications are increasingly leveraging deep neural netw...
research
10/12/2021

SoftNeuro: Fast Deep Inference using Multi-platform Optimization

Faster inference of deep learning models is highly demanded on edge devi...
research
02/16/2020

MDInference: Balancing Inference Accuracy andLatency for Mobile Applications

Deep Neural Networks (DNNs) are allowing mobile devices to incorporate a...
research
06/11/2019

Learning Selection Masks for Deep Neural Networks

Data have often to be moved between servers and clients during the infer...
research
08/03/2021

Interpretable Trade-offs Between Robot Task Accuracy and Compute Efficiency

A robot can invoke heterogeneous computation resources such as CPUs, clo...
research
04/01/2020

Evaluation of Model Selection for Kernel Fragment Recognition in Corn Silage

Model selection when designing deep learning systems for specific use-ca...
research
04/10/2020

Energy Predictive Models for Convolutional Neural Networks on Mobile Platforms

Energy use is a key concern when deploying deep learning models on mobil...

Please sign up or login with your details

Forgot password? Click here to reset