The trade-offs of model size in large recommendation models : A 10000 × compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)

07/21/2022
by   Aditya Desai, et al.
7

Embedding tables dominate industrial-scale recommendation model sizes, using up to terabytes of memory. A popular and the largest publicly available machine learning MLPerf benchmark on recommendation data is a Deep Learning Recommendation Model (DLRM) trained on a terabyte of click-through data. It contains 100GB of embedding memory (25+Billion parameters). DLRMs, due to their sheer size and the associated volume of data, face difficulty in training, deploying for inference, and memory bottlenecks due to large embedding tables. This paper analyzes and extensively evaluates a generic parameter sharing setup (PSS) for compressing DLRM models. We show theoretical upper bounds on the learnable memory requirements for achieving (1 ±ϵ) approximations to the embedding table. Our bounds indicate exponentially fewer parameters suffice for good accuracy. To this end, we demonstrate a PSS DLRM reaching 10000× compression on criteo-tb without losing quality. Such a compression, however, comes with a caveat. It requires 4.5 × more iterations to reach the same saturation quality. The paper argues that this tradeoff needs more investigations as it might be significantly favorable. Leveraging the small size of the compressed model, we show a 4.3× improvement in training latency leading to similar overall training times. Thus, in the tradeoff between system advantage of a small DLRM model vs. slower convergence, we show that scales are tipped towards having a smaller DLRM model, leading to faster inference, easier deployment, and similar training times.

READ FULL TEXT

page 6

page 15

page 16

page 17

research
08/04/2021

Random Offset Block Embedding Array (ROBE) for CriteoTB Benchmark MLPerf DLRM Model : 1000× Compression and 2.7× Faster Inference

Deep learning for recommendation data is the one of the most pervasive a...
research
05/12/2023

Mem-Rec: Memory Efficient Recommendation System using Alternative Representation

Deep learning-based recommendation systems (e.g., DLRMs) are widely used...
research
01/25/2021

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

The memory capacity of embedding tables in deep learning recommendation ...
research
07/21/2022

Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing

Advancements in deep learning are often associated with increasing model...
research
01/25/2022

RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation

We propose RecShard, a fine-grained embedding table (EMB) partitioning a...
research
10/21/2020

Mixed-Precision Embedding Using a Cache

In recommendation systems, practitioners observed that increase in the n...
research
10/09/2018

Extreme Classification in Log Memory

We present Merged-Averaged Classifiers via Hashing (MACH) for K-classifi...

Please sign up or login with your details

Forgot password? Click here to reset