PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

10/14/2018
by   Yunseong Lee, et al.
0

Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5x while reducing memory footprint by 25x, and increasing throughput by 4.7x.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2020

Optimizing Prediction Serving on Low-Latency Serverless Dataflow

Prediction serving systems are designed to provide large volumes of low-...
research
12/22/2021

SOLIS – The MLOps journey from data acquisition to actionable insights

Machine Learning operations is unarguably a very important and also one ...
research
12/05/2018

InferLine: ML Inference Pipeline Composition Framework

The dominant cost in production machine learning workloads is not traini...
research
05/02/2019

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

Machine learning models are becoming the primary workhorses for many app...
research
05/30/2019

INFaaS: Managed & Model-less Inference Serving

The number of applications relying on inference from machine learning mo...
research
11/27/2018

DLHub: Model and Data Serving for Science

While the Machine Learning (ML) landscape is evolving rapidly, there has...
research
06/03/2020

Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

Machine learning inference is becoming a core building block for interac...

Please sign up or login with your details

Forgot password? Click here to reset