Embedding Compression with Isotropic Iterative Quantization

01/11/2020
by   Siyu Liao, et al.
36

Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i.e., GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.

READ FULL TEXT
research
05/23/2022

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

As Deep Neural Networks (DNNs) usually are overparameterized and have mi...
research
03/25/2018

Bernoulli Embeddings for Graphs

Just as semantic hashing can accelerate information retrieval, binary va...
research
08/26/2019

Differentiable Product Quantization for End-to-End Embedding Compression

Embedding layer is commonly used to map discrete symbols into continuous...
research
11/05/2019

Post-Training 4-bit Quantization on Embedding Tables

Continuous representations have been widely adopted in recommender syste...
research
10/01/2020

Faster Binary Embeddings for Preserving Euclidean Distances

We propose a fast, distance-preserving, binary embedding algorithm to tr...
research
08/15/2019

Hamming Sentence Embeddings for Information Retrieval

In retrieval applications, binary hashes are known to offer significant ...
research
01/30/2019

Tensorized Embedding Layers for Efficient Model Compression

The embedding layers transforming input words into real vectors are the ...

Please sign up or login with your details

Forgot password? Click here to reset