TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

01/25/2021
by   Chunxing Yin, et al.
1

The memory capacity of embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically from tens of GBs to TBs across the industry. Given the fast growth in DLRMs, novel solutions are urgently needed, in order to enable fast and efficient DLRM innovations. At the same time, this must be done without having to exponentially increase infrastructure capacity demands. In this paper, we demonstrate the promising potential of Tensor Train decomposition for DLRMs (TT-Rec), an important yet under-investigated context. We design and implement optimized kernels (TT-EmbeddingBag) to evaluate the proposed TT-Rec design. TT-EmbeddingBag is 3 times faster than the SOTA TT implementation. The performance of TT-Rec is further optimized with the batched matrix multiplication and caching strategies for embedding vector lookup operations. In addition, we present mathematically and empirically the effect of weight initialization distribution on DLRM accuracy and propose to initialize the tensor cores of TT-Rec following the sampled Gaussian distribution. We evaluate TT-Rec across three important design space dimensions – memory capacity, accuracy, and timing performance – by training MLPerf-DLRM with Criteo's Kaggle and Terabyte data sets. TT-Rec achieves 117 times and 112 times model size compression, for Kaggle and Terabyte, respectively. This impressive model size reduction can come with no accuracy nor training time overhead as compared to the uncompressed baseline.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 11

page 12

page 13

research
08/04/2021

Random Offset Block Embedding Array (ROBE) for CriteoTB Benchmark MLPerf DLRM Model : 1000× Compression and 2.7× Faster Inference

Deep learning for recommendation data is the one of the most pervasive a...
research
07/21/2022

The trade-offs of model size in large recommendation models : A 10000 × compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)

Embedding tables dominate industrial-scale recommendation model sizes, u...
research
05/26/2019

HadaNets: Flexible Quantization Strategies for Neural Networks

On-board processing elements on UAVs are currently inadequate for traini...
research
04/19/2023

MTrainS: Improving DLRM training efficiency using heterogeneous memories

Recommendation models are very large, requiring terabytes (TB) of memory...
research
11/22/2022

An Incremental Tensor Train Decomposition Algorithm

We present a new algorithm for incrementally updating the tensor-train d...
research
06/21/2022

Nimble GNN Embedding with Tensor-Train Decomposition

This paper describes a new method for representing embedding tables of g...
research
02/26/2019

Saec: Similarity-Aware Embedding Compression in Recommendation Systems

Production recommendation systems rely on embedding methods to represent...

Please sign up or login with your details

Forgot password? Click here to reset