RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure

11/09/2022
by   Mark Zhao, et al.
0

We present RecD (Recommendation Deduplication), a suite of end-to-end infrastructure optimizations across the Deep Learning Recommendation Model (DLRM) training pipeline. RecD addresses immense storage, preprocessing, and training overheads caused by feature duplication inherent in industry-scale DLRM training datasets. Feature duplication arises because DLRM datasets are generated from interactions. While each user session can generate multiple training samples, many features' values do not change across these samples. We demonstrate how RecD exploits this property, end-to-end, across a deployed training pipeline. RecD optimizes data generation pipelines to decrease dataset storage and preprocessing resource demands and to maximize duplication within a training batch. RecD introduces a new tensor format, InverseKeyedJaggedTensors (IKJTs), to deduplicate feature values in each batch. We show how DLRM model architectures can leverage IKJTs to drastically increase training throughput. RecD improves the training and preprocessing throughput and storage efficiency by up to 2.48x, 1.79x, and 3.71x, respectively, in an industry-scale DLRM training system.

READ FULL TEXT
research
08/20/2021

Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training

The data ingestion pipeline, responsible for storing and preprocessing t...
research
02/17/2022

Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

Preprocessing pipelines in deep learning aim to provide sufficient data ...
research
09/12/2022

SELTO: Sample-Efficient Learned Topology Optimization

We present a sample-efficient deep learning strategy for topology optimi...
research
01/08/2020

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

Neural personalized recommendation is the corner-stone of a wide collect...
research
04/18/2023

Understand Data Preprocessing for Effective End-to-End Training of Deep Neural Networks

In this paper, we primarily focus on understanding the data preprocessin...
research
11/16/2018

Image Classification at Supercomputer Scale

Deep learning is extremely computationally intensive, and hardware vendo...
research
06/03/2021

JIZHI: A Fast and Cost-Effective Model-As-A-Service System for Web-Scale Online Inference at Baidu

In modern internet industries, deep learning based recommender systems h...

Please sign up or login with your details

Forgot password? Click here to reset