Unsupervised Multi-Index Semantic Hashing

by   Christian Hansen, et al.

Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster alternatives exist. One such alternative is multi-index hashing, an approach that constructs a smaller candidate set to search over, which depending on the distribution of the hash codes can lead to sub-linear search time. In this work, we propose Multi-Index Semantic Hashing (MISH), an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We derive novel training objectives, which enable to learn hash codes that reduce the candidate sets produced by multi-index hashing, while being end-to-end trainable. In fact, our proposed training objectives are model agnostic, i.e., not tied to how the hash codes are generated specifically in MISH, and are straight-forward to include in existing and future semantic hashing models. We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search. We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33 state-of-the-art effectiveness.


page 8

page 9


ElasticHash: Semantic Image Similarity Search by Deep Hashing with Elasticsearch

We present ElasticHash, a novel approach for high-quality, efficient, an...

Unsupervised Semantic Hashing with Pairwise Reconstruction

Semantic Hashing is a popular family of methods for efficient similarity...

Short Text Hashing Improved by Integrating Multi-Granularity Topics and Tags

Due to computational and storage efficiencies of compact binary codes, h...

Unsupervised Neural Generative Semantic Hashing

Fast similarity search is a key component in large-scale information ret...

Representation Learning for Efficient and Effective Similarity Search and Recommendation

How data is represented and operationalized is critical for building com...

Learning to Hash Naturally Sorts

Locality sensitive hashing pictures a list-wise sorting problem. Its tes...

NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

Semantic hashing has become a powerful paradigm for fast similarity sear...

Please sign up or login with your details

Forgot password? Click here to reset