Mem-Rec: Memory Efficient Recommendation System using Alternative Representation

05/12/2023
by   Gopi Krishna Jha, et al.
0

Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI models to provide high-quality personalized recommendations. Training data used for modern recommendation systems commonly includes categorical features taking on tens-of-millions of possible distinct values. These categorical tokens are typically assigned learned vector representations, that are stored in large embedding tables, on the order of 100s of GB. Storing and accessing these tables represent a substantial burden in commercial deployments. Our work proposes MEM-REC, a novel alternative representation approach for embedding tables. MEM-REC leverages bloom filters and hashing methods to encode categorical features using two cache-friendly embedding tables. The first table (token embedding) contains raw embeddings (i.e. learned vector representation), and the second table (weight embedding), which is much smaller, contains weights to scale these raw embeddings to provide better discriminative capability to each data point. We provide a detailed architecture, design and analysis of MEM-REC addressing trade-offs in accuracy and computation requirements, in comparison with state-of-the-art techniques. We show that MEM-REC can not only maintain the recommendation quality and significantly reduce the memory footprint for commercial scale recommendation models but can also improve the embedding latency. In particular, based on our results, MEM-REC compresses the MLPerf CriteoTB benchmark DLRM model size by 2900x and performs up to 3.4x faster embeddings while achieving the same AUC as that of the full uncompressed model.

READ FULL TEXT

page 3

page 9

page 13

research
09/04/2019

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Modern deep learning-based recommendation systems exploit hundreds to th...
research
03/28/2022

Learning to Collide: Recommendation System Model Compression with Learned Hash Functions

A key characteristic of deep recommendation models is the immense memory...
research
02/24/2021

Semantically Constrained Memory Allocation (SCMA) for Embedding in Efficient Recommendation Systems

Deep learning-based models are utilized to achieve state-of-the-art perf...
research
03/18/2022

Learning Compressed Embeddings for On-Device Inference

In deep learning, embeddings are widely used to represent categorical en...
research
07/21/2022

The trade-offs of model size in large recommendation models : A 10000 × compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)

Embedding tables dominate industrial-scale recommendation model sizes, u...
research
02/18/2022

iMARS: An In-Memory-Computing Architecture for Recommendation Systems

Recommendation systems (RecSys) suggest items to users by predicting the...
research
10/12/2022

Clustering Embedding Tables, Without First Learning Them

To work with categorical features, machine learning systems employ embed...

Please sign up or login with your details

Forgot password? Click here to reset