Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU

11/28/2021
by   Fuxun Yu, et al.
0

With the fast development of deep neural networks (DNNs), many real-world applications are adopting multiple models to conduct compound tasks, such as co-running classification, detection, and segmentation models on autonomous vehicles. Such multi-tenant DNN inference cases greatly exacerbate the computational complexity and call for comprehensive collaboration for graph-level operator scheduling, runtime-level resource awareness, as well as hardware scheduler support. However, the current scheduling support for such multi-tenant inference is still relatively backward. In this work, we propose a resource-aware scheduling framework for efficient multi-tenant DNN inference on GPU, which automatically coordinates DNN computing in different execution levels. Leveraging the unified scheduling intermediate representation and the automated ML-based searching algorithm, optimal schedules could be generated to wisely adjust model concurrency and interleave DNN model operators, maintaining a continuously balanced resource utilization across the entire inference process, and eventually improving the runtime efficiency. Experiments show that we could consistently achieve 1.3-1.7x speed-up, compared to regular DNN runtime libraries (e.g., CuDNN, TVM) and particular concurrent scheduling methods (e.g., NVIDIA Multi-Stream).

READ FULL TEXT

page 1

page 6

research
07/10/2023

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Many applications such as autonomous driving and augmented reality, requ...
research
04/10/2023

RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs

Deep neural networks (DNNs) have substantial computational and memory re...
research
04/23/2023

GACER: Granularity-Aware ConcurrEncy Regulation for Multi-Tenant Deep Learning

As deep learning continues to advance and is applied to increasingly com...
research
02/03/2023

DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a Multi-Tasking System

As deep neural networks (DNNs) prove their importance and feasibility, m...
research
08/26/2023

Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization

Memory-aware network scheduling is becoming increasingly important for d...
research
06/07/2022

Exploration of Systolic-Vector Architecture with Resource Scheduling for Dynamic ML Workloads

As artificial intelligence (AI) and machine learning (ML) technologies d...
research
09/28/2020

Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Standardized DNN models that have been proved to perform well on machine...

Please sign up or login with your details

Forgot password? Click here to reset