Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

04/21/2023
by   Mehran Salmani, et al.
0

The use of machine learning (ML) inference for various applications is growing drastically. ML inference services engage with users directly, requiring fast and accurate responses. Moreover, these services face dynamic workloads of requests, imposing changes in their computing resources. Failing to right-size computing resources results in either latency service level objectives (SLOs) violations or wasted computing resources. Adapting to dynamic workloads considering all the pillars of accuracy, latency, and resource cost is challenging. In response to these challenges, we propose InfAdapter, which proactively selects a set of ML model variants with their resource allocations to meet latency SLO while maximizing an objective function composed of accuracy and cost. InfAdapter decreases SLO violation and costs up to 65 respectively, compared to a popular industry autoscaler (Kubernetes Vertical Pod Autoscaler).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2023

FaaSwap: SLO-Aware, GPU-Efficient Serverless Inference via Model Swapping

The dynamic request patterns of machine learning (ML) inference workload...
research
06/09/2021

Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud

With a growing demand for adopting ML models for a varietyof application...
research
04/18/2022

Dynamic Network Adaptation at Inference

Machine learning (ML) inference is a real-time workload that must comply...
research
04/02/2019

BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services

Pre-trained deep learning models are increasingly being used to offer a ...
research
08/24/2023

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

Efficiently optimizing multi-model inference pipelines for fast, accurat...
research
05/30/2019

INFaaS: Managed & Model-less Inference Serving

The number of applications relying on inference from machine learning mo...
research
10/18/2019

Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data

The usability and practicality of any machine learning (ML) applications...

Please sign up or login with your details

Forgot password? Click here to reset