Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

06/03/2020
by   Arpan Gujarati, et al.
0

Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable-on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using production trace workloads, and show that Clockwork can support thousands of models while simultaneously meeting 100 ms latency targets for 99.997 exploits predictable execution times to achieve tight request-level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2022

Orloj: Predictably Serving Unpredictable DNNs

Existing DNN serving solutions can provide tight latency SLOs while main...
research
01/18/2021

Accelerating Deep Learning Inference via Learned Caches

Deep Neural Networks (DNNs) are witnessing increased adoption in multipl...
research
07/11/2020

Optimizing Prediction Serving on Low-Latency Serverless Dataflow

Prediction serving systems are designed to provide large volumes of low-...
research
04/18/2022

Dynamic Network Adaptation at Inference

Machine learning (ML) inference is a real-time workload that must comply...
research
05/02/2019

Parity Models: A General Framework for Coding-Based Resilience in ML Inference

Machine learning models are becoming the primary workhorses for many app...
research
10/14/2018

PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

Machine Learning models are often composed of pipelines of transformatio...
research
08/14/2023

Symphony: Optimized Model Serving using Centralized Orchestration

The orchestration of deep neural network (DNN) model inference on GPU cl...

Please sign up or login with your details

Forgot password? Click here to reset