BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services

04/02/2019
by   Anirban Bhattacharjee, et al.
0

Pre-trained deep learning models are increasingly being used to offer a variety of compute-intensive predictive analytics services such as fitness tracking, speech and image recognition. The stateless and highly parallelizable nature of deep learning models makes them well-suited for serverless computing paradigm. However, making effective resource management decisions for these services is a hard problem due to the dynamic workloads and diverse set of available resource configurations that have their deployment and management costs. To address these challenges, we present a distributed and scalable deep-learning prediction serving system called Barista and make the following contributions. First, we present a fast and effective methodology for forecasting workloads by identifying various trends. Second, we formulate an optimization problem to minimize the total cost incurred while ensuring bounded prediction latency with reasonable accuracy. Third, we propose an efficient heuristic to identify suitable compute resource configurations. Fourth, we propose an intelligent agent to allocate and manage the compute resources by horizontal and vertical scaling to maintain the required prediction latency. Finally, using representative real-world workloads for urban transportation service, we demonstrate and validate the capabilities of Barista.

READ FULL TEXT

page 1

page 5

page 9

research
04/21/2023

Reconciling High Accuracy, Cost-Efficiency, and Low Latency of Inference Serving Systems

The use of machine learning (ML) inference for various applications is g...
research
12/05/2019

Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models

Deep learning models are increasingly used for end-user applications, su...
research
06/09/2021

Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud

With a growing demand for adopting ML models for a varietyof application...
research
05/21/2019

Performance Analysis of Deep Learning Workloads on Leading-edge Systems

This work examines the performance of leading-edge systems designed for ...
research
08/10/2020

HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units

Deep learning models have achieved expert-level performance in healthcar...
research
04/11/2019

FECBench: A Holistic Interference-aware Approach for Application Performance Modeling

Services hosted in multi-tenant cloud platforms often encounter performa...
research
12/01/2021

Learned Autoscaling for Cloud Microservices with Multi-Armed Bandits

As cloud applications shift from monolithic architectures to loosely cou...

Please sign up or login with your details

Forgot password? Click here to reset