A Survey of Multi-Tenant Deep Learning Inference on GPU

03/17/2022
by   Fuxun Yu, et al.
0

Deep Learning (DL) models have achieved superior performance. Meanwhile, computing hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x throughput and memory bandwidth for each generation. With such strong computing scaling of GPUs, multi-tenant deep learning inference by co-locating multiple DL models onto the same GPU becomes widely deployed to improve resource utilization, enhance serving throughput, reduce energy cost, etc. However, achieving efficient multi-tenant DL inference is challenging which requires thorough full-stack system optimization. This survey aims to summarize and categorize the emerging challenges and optimization opportunities for multi-tenant DL inference on GPU. By overviewing the entire optimization stack, summarizing the multi-tenant computing innovations, and elaborating the recent technological advances, we hope that this survey could shed light on new optimization perspectives and motivate novel works in future large-scale DL system optimization.

READ FULL TEXT

page 2

page 3

research
11/28/2021

A Survey of Large-Scale Deep Learning Serving System Optimization: Challenges and Opportunities

Deep Learning (DL) models have achieved superior performance in many app...
research
09/01/2023

FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference

Serverless computing (FaaS) has been extensively utilized for deep learn...
research
04/05/2021

GPU Domain Specialization via Composable On-Package Architecture

As GPUs scale their low precision matrix math throughput to boost deep l...
research
10/09/2022

Deep Learning Inference Frameworks Benchmark

Deep learning (DL) has been widely adopted those last years but they are...
research
04/23/2023

GACER: Granularity-Aware ConcurrEncy Regulation for Multi-Tenant Deep Learning

As deep learning continues to advance and is applied to increasingly com...
research
03/21/2023

DIPPM: a Deep Learning Inference Performance Predictive Model using Graph Neural Networks

Deep Learning (DL) has developed to become a corner-stone in many everyd...
research
12/04/2021

Understanding the Limits of Conventional Hardware Architectures for Deep-Learning

Deep learning and hardware for it has garnered immense academic and indu...

Please sign up or login with your details

Forgot password? Click here to reset