Building Heterogeneous Cloud System for Machine Learning Inference

10/12/2022
by   Baolin Li, et al.
0

Online inference is becoming a key service product for many businesses, deployed in cloud platforms to meet customer demands. Despite their revenue-generation capability, these services need to operate under tight Quality-of-Service (QoS) and cost budget constraints. This paper introduces KAIROS, a novel runtime framework that maximizes the query throughput while meeting QoS target and a cost budget. KAIROS designs and implements novel techniques to build a pool of heterogeneous compute hardware without online exploration overhead, and distribute inference queries optimally at runtime. Our evaluation using industry-grade deep learning (DL) models shows that KAIROS yields up to 2X the throughput of an optimal homogeneous solution, and outperforms state-of-the-art schemes by up to 70 implementations of the competing schemes to ignore their exploration overhead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2022

RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances

Deep learning model inference is a key service in many businesses and sc...
research
04/09/2021

Harnessing the Potential of Function-Reuse in Multimedia Cloud Systems

Cloud-based computing systems can get oversubscribed due to the budget c...
research
05/02/2019

Leveraging Deep Learning to Improve the Performance Predictability of Cloud Microservices

Performance unpredictability is a major roadblock towards cloud adoption...
research
04/24/2018

Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging

Performance unpredictability in cloud services leads to poor user experi...
research
05/10/2022

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

With the advent of ubiquitous deployment of smart devices and the Intern...
research
01/01/2021

Sage: Using Unsupervised Learning for Scalable Performance Debugging in Microservices

Cloud applications are increasingly shifting from large monolithic servi...
research
08/06/2021

FloMore: Meeting bandwidth requirements of flows

Wide-area cloud provider networks must support the bandwidth requirement...

Please sign up or login with your details

Forgot password? Click here to reset