Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem

09/18/2021
by   Cheng Tan, et al.
11

Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs that partitions one physical GPU into multiple GPU instances. With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs). However, discovering the most efficient GPU partitions is challenging. The underlying problem is NP-hard; moreover, it is a new abstract problem, which we define as the Reconfigurable Machine Scheduling Problem (RMS). This paper studies serving DNNs with MIG, a new case of RMS. We further propose a solution, MIG-serving. MIG- serving is an algorithm pipeline that blends a variety of newly designed algorithms and customized classic algorithms, including a heuristic greedy algorithm, Genetic Algorithm (GA), and Monte Carlo Tree Search algorithm (MCTS). We implement MIG-serving on Kubernetes. Our experiments show that compared to using A100 as-is, MIG-serving can save up to 40

READ FULL TEXT

page 4

page 11

page 12

page 17

page 18

page 19

page 20

research
02/27/2022

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

In cloud machine learning (ML) inference systems, providing low latency ...
research
01/18/2022

An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Ensembles of Deep Neural Networks (DNNs) has achieved qualitative predic...
research
10/09/2020

TurboTransformers: An Efficient GPU Serving System For Transformer Models

The transformer is the most critical algorithm innovation of the Nature ...
research
09/28/2020

Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Standardized DNN models that have been proved to perform well on machine...
research
07/21/2023

Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs

Deploying deep learning models in cloud clusters provides efficient and ...
research
07/24/2018

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

There is a trend towards using very large deep neural networks (DNN) to ...

Please sign up or login with your details

Forgot password? Click here to reset