Adaptive Scheduling for Edge-Assisted DNN Serving

04/19/2023
by   Jian He, et al.
0

Deep neural networks (DNNs) have been widely used in various video analytic tasks. These tasks demand real-time responses. Due to the limited processing power on mobile devices, a common way to support such real-time analytics is to offload the processing to an edge server. This paper examines how to speed up the edge server DNN processing for multiple clients. In particular, we observe batching multiple DNN requests significantly speeds up the processing time. Based on this observation, we first design a novel scheduling algorithm to exploit the batching benefits of all requests that run the same DNN. This is compelling since there are only a handful of DNNs and many requests tend to use the same DNN. Our algorithms are general and can support different objectives, such as minimizing the completion time or maximizing the on-time ratio. We then extend our algorithm to handle requests that use different DNNs with or without shared layers. Finally, we develop a collaborative approach to further improve performance by adaptively processing some of the requests or portions of the requests locally at the clients. This is especially useful when the network and/or server is congested. Our implementation shows the effectiveness of our approach under different request distributions (e.g., Poisson, Pareto, and Constant inter-arrivals).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2023

Adaptive DNN Surgery for Selfish Inference Acceleration with On-demand Edge Resource

Deep Neural Networks (DNNs) have significantly improved the accuracy of ...
research
12/23/2015

Mixed-Criticality Scheduling with I/O

This paper addresses the problem of scheduling tasks with different crit...
research
05/05/2021

DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge

The ubiquity of smartphone cameras and IoT cameras, together with the re...
research
02/08/2018

Adaptive online scheduling of tasks with anytime property on heterogeneous resources

An acceptable response time of a server is an important aspect in many c...
research
08/31/2022

Orloj: Predictably Serving Unpredictable DNNs

Existing DNN serving solutions can provide tight latency SLOs while main...
research
09/27/2022

Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs

With deep neural networks (DNNs) emerging as the backbone in a multitude...
research
02/07/2020

Accelerating Deep Learning Inference via Freezing

Over the last few years, Deep Neural Networks (DNNs) have become ubiquit...

Please sign up or login with your details

Forgot password? Click here to reset