BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

05/01/2023
by   Ziyang Zhang, et al.
0

As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple DNN models pose new challenges for scheduler designs. First, each request may have different service level objectives (SLOs) to improve quality of service (QoS). Second, the edge platforms should be able to efficiently schedule multiple heterogeneous DNN models so that system utilization can be improved. To meet these two goals, this paper proposes BCEdge, a novel learning-based scheduling framework that takes adaptive batching and concurrent execution of DNN inference services on edge platforms. We define a utility function to evaluate the trade-off between throughput and latency. The scheduler in BCEdge leverages maximum entropy-based deep reinforcement learning (DRL) to maximize utility by 1) co-optimizing batch size and 2) the number of concurrent models automatically. Our prototype implemented on different edge platforms shows that the proposed BCEdge enhances utility by up to 37.6 average, compared to state-of-the-art solutions, while satisfying SLOs.

READ FULL TEXT

page 1

page 7

page 8

page 10

page 12

research
05/10/2023

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks

Driven by the wide adoption of deep neural networks (DNNs) across differ...
research
07/10/2023

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Many applications such as autonomous driving and augmented reality, requ...
research
01/31/2023

Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning

Many real-time applications (e.g., Augmented/Virtual Reality, cognitive ...
research
06/15/2022

Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices

DNNs are ubiquitous on edge devices nowadays. With its increasing import...
research
02/06/2018

RDMAvisor: Toward Deploying Scalable and Simple RDMA as a Service in Datacenters

RDMA is increasingly adopted by cloud computing platforms to provide low...
research
08/31/2022

Orloj: Predictably Serving Unpredictable DNNs

Existing DNN serving solutions can provide tight latency SLOs while main...
research
09/06/2019

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

To amortize cost, cloud vendors providing DNN acceleration as a service ...

Please sign up or login with your details

Forgot password? Click here to reset