Efficient Multi-stage Inference on Tabular Data

03/21/2023
by   Daniel S Johnson, et al.
0

Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, the separation adds network latency and entails additional CPU overhead. Hence, we simplify inference algorithms and embed them into the product code to reduce network communication. For public datasets and a high-performance real-time platform that deals with tabular data, we show that over half of the inputs are often amenable to such optimization, while the remainder can be handled by the original model. By applying our optimization with AutoML to both training and inference, we reduce inference latency by 1.3x, CPU resources by 30 front-end and ML back-end by about 50 that serves millions of real-time decisions per second.

READ FULL TEXT

page 5

page 6

research
10/14/2021

Looper: An end-to-end ML platform for product decisions

Modern software systems and products increasingly rely on machine learni...
research
06/11/2022

Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks

Support for Machine Learning (ML) applications in networks has significa...
research
03/03/2021

A Survey for Real-Time Network Performance Measurement via Machine Learning

Real-Time Networks (RTNs) provide latency guarantees for time-critical a...
research
08/09/2022

Adversarial Machine Learning-Based Anticipation of Threats Against Vehicle-to-Microgrid Services

In this paper, we study the expanding attack surface of Adversarial Mach...
research
12/07/2022

SDRM3: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and dro...
research
11/23/2020

Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication

Large software systems tune hundreds of 'constants' to optimize their ru...
research
04/20/2022

fairDMS: Rapid Model Training by Data and Model Reuse

Extracting actionable information from data sources such as the Linac Co...

Please sign up or login with your details

Forgot password? Click here to reset