CacheNet: A Model Caching Framework for Deep Learning Inference on the Edge

07/03/2020
by   Yihao Fang, et al.
0

The success of deep neural networks (DNN) in machine perception applications such as image classification and speech recognition comes at the cost of high computation and storage complexity. Inference of uncompressed large scale DNN models can only run in the cloud with extra communication latency back and forth between cloud and end devices, while compressed DNN models achieve real-time inference on end devices at the price of lower predictive accuracy. In order to have the best of both worlds (latency and accuracy), we propose CacheNet, a model caching framework. CacheNet caches low-complexity models on end devices and high-complexity (or full) models on edge or cloud servers. By exploiting temporal locality in streaming data, high cache hit and consequently shorter latency can be achieved with no or only marginal decrease in prediction accuracy. Experiments on CIFAR-10 and FVG have shown CacheNet is 58-217 than baseline approaches that run inference tasks on end devices or edge servers alone.

READ FULL TEXT
research
09/18/2022

Improving the Performance of DNN-based Software Services using Automated Layer Caching

Deep Neural Networks (DNNs) have become an essential component in many a...
research
08/30/2021

Auto-Split: A General Framework of Collaborative Edge-Cloud AI

In many industry scale applications, large and resource consuming machin...
research
03/27/2022

Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge

Text-to-Speech (TTS) services that run on edge devices have many advanta...
research
12/25/2018

JALAD: Joint Accuracy- and Latency-Aware Deep Structure Decoupling for Edge-Cloud Execution

Recent years have witnessed a rapid growth of deep-network based service...
research
11/24/2018

TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

Deep neural networks (DNNs) have become core computation components with...
research
01/18/2021

Accelerating Deep Learning Inference via Learned Caches

Deep Neural Networks (DNNs) are witnessing increased adoption in multipl...
research
08/24/2022

CheapET-3: Cost-Efficient Use of Remote DNN Models

On complex problems, state of the art prediction accuracy of Deep Neural...

Please sign up or login with your details

Forgot password? Click here to reset