Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

02/23/2023
by   Yujeong Choi, et al.
0

While providing low latency is a fundamental requirement in deploying recommendation services, achieving high resource utility is also crucial in cost-effectively maintaining the datacenter. Co-locating multiple workers of a model is an effective way to maximize query-level parallelism and server throughput, but the interference caused by concurrent workers at shared resources can prevent server queries from meeting its SLA. Hera utilizes the heterogeneous memory requirement of multi-tenant recommendation models to intelligently determine a productive set of co-located models and its resource allocation, providing fast response time while achieving high throughput. We show that Hera achieves an average 37.3 utilization, enabling 26 improving upon the baseline recommedation inference server.

READ FULL TEXT

page 4

page 6

page 7

page 11

research
03/14/2022

Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

Personalized recommendation is an important class of deep-learning appli...
research
04/23/2023

GACER: Granularity-Aware ConcurrEncy Regulation for Multi-Tenant Deep Learning

As deep learning continues to advance and is applied to increasingly com...
research
12/02/2022

DisaggRec: Architecting Disaggregated Systems for Large-Scale Personalized Recommendation

Deep learning-based personalized recommendation systems are widely used ...
research
06/22/2023

MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge

Cascade systems comprise a two-model sequence, with a lightweight model ...
research
06/02/2023

ODIN: Overcoming Dynamic Interference in iNference pipelines

As an increasing number of businesses becomes powered by machine-learnin...
research
04/17/2018

Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems

Heterogeneity has grown in popularity both at the core and server level ...
research
05/30/2019

INFaaS: Managed & Model-less Inference Serving

The number of applications relying on inference from machine learning mo...

Please sign up or login with your details

Forgot password? Click here to reset