Hamming Sentence Embeddings for Information Retrieval

08/15/2019
by   Felix Hamann, et al.
0

In retrieval applications, binary hashes are known to offer significant improvements in terms of both memory and speed. We investigate the compression of sentence embeddings using a neural encoder-decoder architecture, which is trained by minimizing reconstruction error. Instead of employing the original real-valued embeddings, we use latent representations in Hamming space produced by the encoder for similarity calculations. In quantitative experiments on several benchmarks for semantic similarity tasks, we show that our compressed hamming embeddings yield a comparable performance to uncompressed embeddings (Sent2Vec, InferSent, Glove-BoW), at compression ratios of up to 256:1. We further demonstrate that our model strongly decorrelates input features, and that the compressor generalizes well when pre-trained on Wikipedia sentences. We publish the source code on Github and all experimental results.

READ FULL TEXT

page 3

page 4

research
08/19/2021

Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

We provide the first exploration of text-to-text transformers (T5) sente...
research
06/19/2019

Learning Compressed Sentence Representations for On-Device Text Processing

Vector representations of sentences, trained on massive text corpora, ar...
research
03/15/2022

Compressing Sentence Representation for Semantic Retrieval via Homomorphic Projective Distillation

How to learn highly compact yet effective sentence representation? Pre-t...
research
03/25/2018

Bernoulli Embeddings for Graphs

Just as semantic hashing can accelerate information retrieval, binary va...
research
07/09/2019

Multilingual Universal Sentence Encoder for Semantic Retrieval

We introduce two pre-trained retrieval focused multilingual sentence enc...
research
01/11/2020

Embedding Compression with Isotropic Iterative Quantization

Continuous representation of words is a standard component in deep learn...
research
03/24/2018

Near-lossless Binarization of Word Embeddings

Is it possible to learn binary word embeddings of arbitrary size from th...

Please sign up or login with your details

Forgot password? Click here to reset