Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications

02/12/2019
by   Peifeng Yu, et al.
0

GPU computing is becoming increasingly more popular with the proliferation of deep learning (DL) applications. However, unlike traditional resources such as CPU or the network, modern GPUs do not natively support fine-grained sharing primitives. Consequently, implementing common policies such as time sharing and preemption are expensive. Worse, when a DL application cannot completely use a GPU's resources, the GPU cannot be efficiently shared between multiple applications, leading to GPU underutilization. We present Salus to enable two GPU sharing primitives: fast job switching and memory sharing, in order to achieve fine-grained GPU sharing among multiple DL applications. Salus implements an efficient, consolidated execution service that exposes the GPU to different DL applications, and enforces fine-grained sharing by performing iteration scheduling and addressing associated memory management issues. We show that these primitives can then be used to implement flexible sharing policies such as fairness, prioritization, and packing for various use cases. Our integration of Salus with TensorFlow and evaluation on popular DL jobs show that Salus can improve the average completion time of DL training jobs by 3.19×, GPU utilization for hyper-parameter tuning by 2.38×, and GPU utilization of DL inference applications by 42× over not sharing the GPU and 7× over NVIDIA MPS with small overhead.

READ FULL TEXT
research
04/04/2023

DLRover: An Elastic Deep Training Extension with Auto Job Resource Recommendation

The cloud is still a popular platform for distributed deep learning (DL)...
research
09/01/2023

FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference

Serverless computing (FaaS) has been extensively utilized for deep learn...
research
06/20/2023

Fine-grained Policy-driven I/O Sharing for Burst Buffers

A burst buffer is a common method to bridge the performance gap between ...
research
05/12/2013

Practical Fine-grained Privilege Separation in Multithreaded Applications

An inherent security limitation with the classic multithreaded programmi...
research
12/17/2021

Exploring the Impact of Virtualization on the Usability of the Deep Learning Applications

Deep Learning-based (DL) applications are becoming increasingly popular ...
research
11/24/2019

FusionStitching: Boosting Execution Efficiency of Memory Intensive Computations for DL Workloads

Performance optimization is the art of continuous seeking a harmonious m...
research
05/22/2023

A Framework for Fine-Grained Synchronization of Dependent GPU Kernels

Machine Learning (ML) models contain highly-parallel computations, such ...

Please sign up or login with your details

Forgot password? Click here to reset