Heterogeneous Acceleration Pipeline for Recommendation System Training

04/11/2022
by   Muhammad Adnan, et al.
11

Recommendation systems are unique as they show a conflation of compute and memory intensity due to their deep learning and massive embedding tables. Training these models typically involve a hybrid CPU-GPU mode, where GPUs accelerate the deep learning portion and the CPUs store and process the memory-intensive embedding tables. The hybrid mode incurs a substantial CPU-to-GPU transfer time and relies on main memory bandwidth to feed embeddings to GPU for deep learning acceleration. Alternatively, we can store the entire embeddings across GPUs to avoid the transfer time and utilize the GPU's High Bandwidth Memory (HBM). This approach requires GPU-to-GPU backend communication and scales the number of GPUs with the size of the embedding tables. To overcome these concerns, this paper offers a heterogeneous acceleration pipeline, called Hotline. Hotline leverages the insight that only a small number of embedding entries are accessed frequently, and can easily fit in a single GPU's HBM. Hotline implements a data-aware and model-aware scheduling pipeline that utilizes the (1) CPU main memory for not-frequently-accessed embeddings and (2) GPUs' local memory for frequently-accessed embeddings. Hotline improves the training throughput by dynamically stitching the execution of popular and not-popular inputs through a novel hardware accelerator and feeding to the GPUs. Results on real-world datasets and recommender models show that Hotline reduces the average training time by 3x and 1.8x in comparison to Intel-optimized CPU-GPU DLRM and HugeCTR-optimized GPU-only baseline, respectively. Hotline increases the overall training throughput to 35.7 epochs/hour in comparison to 5.3 epochs/hour for the Intel-optimized DLRM baseline

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 11

research
03/01/2021

High-Performance Training by Exploiting Hot-Embeddings in Recommendation Systems

Recommendation models are commonly used learning models that suggest rel...
research
11/11/2020

Understanding Training Efficiency of Deep Learning Recommendation Models at Scale

The use of GPUs has proliferated for machine learning workflows and is n...
research
01/05/2022

Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising

Click-Through Rate (CTR) prediction is a crucial component in the online...
research
12/23/2020

Hardware-accelerated Simulation-based Inference of Stochastic Epidemiology Models for COVID-19

Epidemiology models are central in understanding and controlling large s...
research
08/13/2023

InTune: Reinforcement Learning-based Data Pipeline Optimization for Deep Recommendation Models

Deep learning-based recommender models (DLRMs) have become an essential ...
research
02/27/2021

Acceleration of probabilistic reasoning through custom processor architecture

Probabilistic reasoning is an essential tool for robust decision-making ...
research
08/12/2021

PatrickStar: Parallel Training of Pre-trained Models via a Chunk-based Memory Management

The pre-trained model (PTM) is revolutionizing Artificial intelligence (...

Please sign up or login with your details

Forgot password? Click here to reset