Accelerating Multi-Model Inference by Merging DNNs of Different Weights

09/28/2020
by   Joo Seong Jeong, et al.
0

Standardized DNN models that have been proved to perform well on machine learning tasks are widely used and often adopted as-is to solve downstream tasks, forming the transfer learning paradigm. However, when serving multiple instances of such DNN models from a cluster of GPU servers, existing techniques to improve GPU utilization such as batching are inapplicable because models often do not share weights due to fine-tuning. We propose NetFuse, a technique of merging multiple DNN models that share the same architecture but have different weights and different inputs. NetFuse is made possible by replacing operations with more general counterparts that allow a set of weights to be associated with only a certain set of inputs. Experiments on ResNet-50, ResNeXt-50, BERT, and XLNet show that NetFuse can speed up DNN inference time up to 3.6x on a NVIDIA V100 GPU, and up to 3.0x on a TITAN Xp GPU when merging 32 model instances, while only using up a small additional amount of GPU memory.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

08/08/2020

Spatial Sharing of GPU for Autotuning DNN models

GPUs are used for training, inference, and tuning the machine learning m...
09/01/2021

Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning

As machine learning techniques are applied to a widening range of applic...
09/18/2021

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem

Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs...
01/19/2022

GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge

Video analytics pipelines have steadily shifted to edge deployments to r...
11/28/2021

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU

With the fast development of deep neural networks (DNNs), many real-worl...
07/24/2018

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

There is a trend towards using very large deep neural networks (DNN) to ...
06/07/2021

GAN Cocktail: mixing GANs without dataset access

Today's generative models are capable of synthesizing high-fidelity imag...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.