Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

10/17/2022
by   Joey Wang, et al.
0

In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. In the MLPerf v1.0 DLRM model training benchmark, Merlin HugeCTR achieves a speedup of up to 24.6x on a single DGX A100 (8x A100) over PyTorch on 4x4-socket CPU nodes (4x4x28 cores). Merlin HugeCTR can also take advantage of multi-node environments to accelerate training even further. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference. Using this HPS, Merlin HugeCTR users can achieve a 5 62x speedup (batch size dependent) for popular recommendation models over CPU baseline implementations, and dramatically reduce their end-to-end inference latency.

READ FULL TEXT
research
10/17/2022

A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models

Recommendation systems are of crucial importance for a variety of modern...
research
10/12/2020

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Deep neural networks are widely used in personalized recommendation syst...
research
12/13/2019

Queueing Analysis of GPU-Based Inference Servers with Dynamic Batching: A Closed-Form Characterization

GPU-accelerated computing is a key technology to realize high-speed infe...
research
04/14/2023

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

Collaborative filtering (CF) has been proven to be one of the most effec...
research
05/17/2021

GPU-Accelerated Hierarchical Bayesian Inference with Application to Modeling Cosmic Populations: CUDAHM

We describe a computational framework for hierarchical Bayesian inferenc...
research
12/14/2021

HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework

Embedding models have been an effective learning paradigm for high-dimen...
research
07/09/2021

BayesSimIG: Scalable Parameter Inference for Adaptive Domain Randomization with IsaacGym

BayesSim is a statistical technique for domain randomization in reinforc...

Please sign up or login with your details

Forgot password? Click here to reset