FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models

01/08/2023
by   Geet Sethi, et al.
0

Sequence-based deep learning recommendation models (DLRMs) are an emerging class of DLRMs showing great improvements over their prior sum-pooling based counterparts at capturing users' long term interests. These improvements come at immense system cost however, with sequence-based DLRMs requiring substantial amounts of data to be dynamically materialized and communicated by each accelerator during a single iteration. To address this rapidly growing bottleneck, we present FlexShard, a new tiered sequence embedding table sharding algorithm which operates at a per-row granularity by exploiting the insight that not every row is equal. Through precise replication of embedding rows based on their underlying probability distribution, along with the introduction of a new sharding strategy adapted to the heterogeneous, skewed performance of real-world cluster network topologies, FlexShard is able to significantly reduce communication demand while using no additional memory compared to the prior state-of-the-art. When evaluated on production-scale sequence DLRMs, FlexShard was able to reduce overall global all-to-all communication traffic by over 85 communication latency improvements of almost 6x over the prior state-of-the-art approach.

READ FULL TEXT

page 2

page 9

research
02/18/2021

Dynamic Memory based Attention Network for Sequential Recommendation

Sequential recommendation has become increasingly essential in various o...
research
04/25/2022

Adversarial Filtering Modeling on Long-term User Behavior Sequences for Click-Through Rate Prediction

Rich user behavior information is of great importance for capturing and ...
research
07/12/2021

Denoising User-aware Memory Network for Recommendation

For better user satisfaction and business effectiveness, more and more a...
research
09/29/2021

A Next Basket Recommendation Reality Check

The goal of a next basket recommendation (NBR) system is to recommend it...
research
11/04/2020

Understanding Capacity-Driven Scale-Out Neural Recommendation Inference

Deep learning recommendation models have grown to the terabyte scale. Tr...
research
05/09/2022

PinnerFormer: Sequence Modeling for User Representation at Pinterest

Sequential models have become increasingly popular in powering personali...
research
11/22/2017

Mixture-of-tastes Models for Representing Users with Diverse Interests

Most existing recommendation approaches implicitly treat user tastes as ...

Please sign up or login with your details

Forgot password? Click here to reset