Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

03/14/2022
by   Liu Ke, et al.
0

Personalized recommendation is an important class of deep-learning applications that powers a large collection of internet services and consumes a considerable amount of datacenter resources. As the scale of production-grade recommendation systems continues to grow, optimizing their serving performance and efficiency in a heterogeneous datacenter is important and can translate into infrastructure capacity saving. In this paper, we propose Hercules, an optimized framework for personalized recommendation inference serving that targets diverse industry-representative models and cloud-scale heterogeneous systems. Hercules performs a two-stage optimization procedure - offline profiling and online serving. The first stage searches the large under-explored task scheduling space with a gradient-based search algorithm achieving up to 9.0x latency-bounded throughput improvement on individual servers; it also identifies the optimal heterogeneous server architecture for each recommendation workload. The second stage performs heterogeneity-aware cluster provisioning to optimize resource mapping and allocation in response to fluctuating diurnal loads. The proposed cluster scheduler in Hercules achieves 47.7 state-of-the-art greedy scheduler.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 12

research
01/08/2020

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

Neural personalized recommendation is the corner-stone of a wide collect...
research
12/02/2022

DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation

Deep learning-based personalized recommendation systems are widely used ...
research
11/04/2020

Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Deep learning recommendation models have grown to the terabyte scale. Tr...
research
02/23/2023

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

While providing low latency is a fundamental requirement in deploying re...
research
05/18/2021

RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance

Deep learning recommendation systems must provide high quality, personal...
research
06/03/2021

JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web-Scale Online Inference at Baidu

In modern internet industries, deep learning based recommender systems h...
research
04/17/2018

Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems

Heterogeneity has grown in popularity both at the core and server level ...

Please sign up or login with your details

Forgot password? Click here to reset