MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge

06/22/2023
by   Sokratis Nikolaidis, et al.
0

Cascade systems comprise a two-model sequence, with a lightweight model processing all samples and a heavier, higher-accuracy model conditionally refining harder samples to improve accuracy. By placing the light model on the device side and the heavy model on a server, model cascades constitute a widely used distributed inference approach. With the rapid expansion of intelligent indoor environments, such as smart homes, the new setting of Multi-Device Cascade is emerging where multiple and diverse devices are to simultaneously use a shared heavy model on the same server, typically located within or close to the consumer environment. This work presents MultiTASC, a multi-tenancy-aware scheduler that adaptively controls the forwarding decision functions of the devices in order to maximize the system throughput, while sustaining high accuracy and low latency. By explicitly considering device heterogeneity, our scheduler improves the latency service-level objective (SLO) satisfaction rate by 20-25 percentage points (pp) over state-of-the-art cascade methods in highly heterogeneous setups, while serving over 40 devices, showcasing its scalability.

READ FULL TEXT
research
10/04/2022

Streaming Video Analytics On The Edge With Asynchronous Cloud Support

Emerging Internet of Things (IoT) and mobile computing applications are ...
research
02/23/2023

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

While providing low latency is a fundamental requirement in deploying re...
research
09/27/2022

Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs

With deep neural networks (DNNs) emerging as the backbone in a multitude...
research
03/05/2019

BOINC: A Platform for Volunteer Computing

"Volunteer computing" is the use of consumer digital devices for high-th...
research
05/02/2019

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

With the rapid emergence of a spectrum of high-end mobile devices, many ...
research
09/08/2021

SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices

We present SensiX++ - a multi-tenant runtime for adaptive model executio...

Please sign up or login with your details

Forgot password? Click here to reset