An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

01/18/2022
by   pierrick-pochelu, et al.
1

Ensembles of Deep Neural Networks (DNNs) has achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational resources. Unlike recent initiatives on inference servers and inference frameworks, which focus on the prediction of single DNNs, we propose a new software layer to serve with flexibility and efficiency ensembles of DNNs. Our inference system is designed with several technical innovations. First, we propose a novel procedure to found a good allocation matrix between devices (CPUs or GPUs) and DNN instances. It runs successively a worst-fit to allocate DNNs into the memory devices and a greedy algorithm to optimize allocation settings and speed up the ensemble. Second, we design the inference system based on multiple processes to run asynchronously: batching, prediction, and the combination rule with an efficient internal communication scheme to avoid overhead. Experiments show the flexibility and efficiency under extreme scenarios: It successes to serve an ensemble of 12 heavy DNNs into 4 GPUs and at the opposite, one single DNN multi-threaded into 16 GPUs. It also outperforms the simple baseline consisting of optimizing the batch size of DNNs by a speedup up to 2.7X on the image classification task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2022

A Deep Neural Networks ensemble workflow from hyperparameter search to inference leveraging GPU clusters

Automated Machine Learning with ensembling (or AutoML with ensembling) s...
research
06/07/2022

Decentralized Low-Latency Collaborative Inference via Ensembles on the Edge

The success of deep neural networks (DNNs) is heavily dependent on compu...
research
09/18/2021

Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem

Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs...
research
02/02/2022

Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers

Deep neural networks (DNNs) have grown exponentially in complexity and s...
research
03/01/2023

FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling

Ensembling independent deep neural networks (DNNs) is a simple and effec...
research
02/21/2022

Survey on Large Scale Neural Network Training

Modern Deep Neural Networks (DNNs) require significant memory to store w...
research
04/11/2021

A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning

Recently, Deep Neural Networks (DNNs) have recorded great success in han...

Please sign up or login with your details

Forgot password? Click here to reset