Unsupervised Multi-Index Semantic Hashing

03/26/2021
by   Christian Hansen, et al.
0

Semantic hashing represents documents as compact binary vectors (hash codes) and allows both efficient and effective similarity search in large-scale information retrieval. The state of the art has primarily focused on learning hash codes that improve similarity search effectiveness, while assuming a brute-force linear scan strategy for searching over all the hash codes, even though much faster alternatives exist. One such alternative is multi-index hashing, an approach that constructs a smaller candidate set to search over, which depending on the distribution of the hash codes can lead to sub-linear search time. In this work, we propose Multi-Index Semantic Hashing (MISH), an unsupervised hashing model that learns hash codes that are both effective and highly efficient by being optimized for multi-index hashing. We derive novel training objectives, which enable to learn hash codes that reduce the candidate sets produced by multi-index hashing, while being end-to-end trainable. In fact, our proposed training objectives are model agnostic, i.e., not tied to how the hash codes are generated specifically in MISH, and are straight-forward to include in existing and future semantic hashing models. We experimentally compare MISH to state-of-the-art semantic hashing baselines in the task of document similarity search. We find that even though multi-index hashing also improves the efficiency of the baselines compared to a linear scan, they are still upwards of 33 state-of-the-art effectiveness.

READ FULL TEXT

page 8

page 9

research
05/08/2023

ElasticHash: Semantic Image Similarity Search by Deep Hashing with Elasticsearch

We present ElasticHash, a novel approach for high-quality, efficient, an...
research
07/01/2020

Unsupervised Semantic Hashing with Pairwise Reconstruction

Semantic Hashing is a popular family of methods for efficient similarity...
research
03/10/2015

Short Text Hashing Improved by Integrating Multi-Granularity Topics and Tags

Due to computational and storage efficiencies of compact binary codes, h...
research
06/03/2019

Unsupervised Neural Generative Semantic Hashing

Fast similarity search is a key component in large-scale information ret...
research
09/04/2021

Representation Learning for Efficient and Effective Similarity Search and Recommendation

How data is represented and operationalized is critical for building com...
research
01/31/2022

Learning to Hash Naturally Sorts

Locality sensitive hashing pictures a list-wise sorting problem. Its tes...
research
05/14/2018

NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

Semantic hashing has become a powerful paradigm for fast similarity sear...

Please sign up or login with your details

Forgot password? Click here to reset