VELTAIR: Towards High-Performance Multi-tenant Deep Learning Services via Adaptive Compilation and Scheduling

01/17/2022
by   Zihan Liu, et al.
0

Deep learning (DL) models have achieved great success in many application domains. As such, many industrial companies such as Google and Facebook have acknowledged the importance of multi-tenant DL services. Although the multi-tenant service has been studied in conventional workloads, it is not been deeply studied on deep learning service, especially on general-purpose hardware. In this work, we systematically analyze the opportunities and challenges of providing multi-tenant deep learning services on the general-purpose CPU architecture from the aspects of scheduling granularity and code generation. We propose an adaptive granularity scheduling scheme to both guarantee resource usage efficiency and reduce the scheduling conflict rate. We also propose an adaptive compilation strategy, by which we can dynamically and intelligently pick a program with proper exclusive and shared resource usage to reduce overall interference-induced performance loss. Compared to the existing works, our design can serve more requests under the same QoS target in various scenarios (e.g., +71 respectively), and reduce the averaged query latency by 50

READ FULL TEXT

page 4

page 6

page 7

page 9

page 11

research
09/03/2021

Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters

Modern GPU datacenters are critical for delivering Deep Learning (DL) mo...
research
05/24/2022

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision

Deep learning (DL) shows its prosperity in a wide variety of fields. The...
research
11/24/2018

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

The application of deep learning techniques resulted in remarkable impro...
research
05/17/2018

Dependability in a Multi-tenant Multi-framework Deep Learning as-a-Service Platform

Deep learning (DL), a form of machine learning, is becoming increasingly...
research
04/30/2021

QoS-Aware Placement of Deep Learning Services on the Edge with Multiple Service Implementations

Mobile edge computing pushes computationally-intensive services closer t...
research
11/26/2019

Intelligent Resource Scheduling for Co-located Latency-critical Services: A Multi-Model Collaborative Learning Approach

Latency-critical services have been widely deployed in cloud environment...

Please sign up or login with your details

Forgot password? Click here to reset