Clustering Embedding Tables, Without First Learning Them

10/12/2022
by   Henry Ling-Hei Tsang, et al.
0

To work with categorical features, machine learning systems employ embedding tables. These tables can become exceedingly large in modern recommendation systems, necessitating the development of new methods for fitting them in memory, even during training. Some of the most successful methods for table compression are Product- and Residual Vector Quantization (Gray Neuhoff, 1998). These methods replace table rows with references to k-means clustered "codewords." Unfortunately, this means they must first know the table before compressing it, so they can only save memory during inference, not training. Recent work has used hashing-based approaches to minimize memory usage during training, but the compression obtained is inferior to that obtained by "post-training" quantization. We show that the best of both worlds may be obtained by combining techniques based on hashing and clustering. By first training a hashing-based "sketch", then clustering it, and then training the clustered quantization, our method achieves compression ratios close to those of post-training quantization with the training time memory reductions of hashing-based methods. We show experimentally that our method provides better compression and/or accuracy that previous methods, and we prove that our method always converges to the optimal embedding table for least-squares training.

READ FULL TEXT
research
03/05/2020

Optimizing JPEG Quantization for Classification Networks

Deep learning for computer vision depends on lossy image compression: it...
research
05/12/2023

Mem-Rec: Memory Efficient Recommendation System using Alternative Representation

Deep learning-based recommendation systems (e.g., DLRMs) are widely used...
research
03/18/2022

Learning Compressed Embeddings for On-Device Inference

In deep learning, embeddings are widely used to represent categorical en...
research
03/28/2022

Learning to Collide: Recommendation System Model Compression with Learned Hash Functions

A key characteristic of deep recommendation models is the immense memory...
research
10/21/2020

Mixed-Precision Embedding Using a Cache

In recommendation systems, practitioners observed that increase in the n...
research
08/13/2020

JQF: Optimal JPEG Quantization Table Fusion by Simulated Annealing on Texture Images and Predicting Textures

JPEG has been a widely used lossy image compression codec for nearly thr...
research
06/11/2019

Table-Based Neural Units: Fully Quantizing Networks for Multiply-Free Inference

In this work, we propose to quantize all parts of standard classificatio...

Please sign up or login with your details

Forgot password? Click here to reset