Optimizing Prediction Serving on Low-Latency Serverless Dataflow

07/11/2020
by   Vikram Sreekanti, et al.
0

Prediction serving systems are designed to provide large volumes of low-latency inferences machine learning models. These systems mix data processing and computationally intensive model inference and benefit from multiple heterogeneous processors and distributed computing resources. In this paper, we argue that a familiar dataflow API is well-suited to this latency-sensitive task, and amenable to optimization even with unmodified black-box ML models. We present the design of Cloudflow, a system that provides this API and realizes it on an autoscaling serverless backend. Cloudflow transparently implements performance-critical optimizations including operator fusion and competitive execution. Our evaluation shows that Cloudflow's optimizations yield significant performance improvements on synthetic workloads and that Cloudflow outperforms state-of-the-art prediction serving systems by as much as 2x on real-world prediction pipelines, meeting latency goals of demanding applications like real-time video analysis.

READ FULL TEXT

page 8

page 9

page 12

research
10/14/2018

PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

Machine Learning models are often composed of pipelines of transformatio...
research
11/27/2018

DLHub: Model and Data Serving for Science

While the Machine Learning (ML) landscape is evolving rapidly, there has...
research
06/06/2018

Deploying Deep Ranking Models for Search Verticals

In this paper, we present an architecture executing a complex machine le...
research
08/21/2020

Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud

We are witnessing an increasing trend towardsusing Machine Learning (ML)...
research
06/03/2020

Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

Machine learning inference is becoming a core building block for interac...
research
10/18/2019

Machine Learning Systems for Highly-Distributed and Rapidly-Growing Data

The usability and practicality of any machine learning (ML) applications...
research
05/30/2019

INFaaS: Managed & Model-less Inference Serving

The number of applications relying on inference from machine learning mo...

Please sign up or login with your details

Forgot password? Click here to reset