Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval

05/27/2021
by   Zijing Ou, et al.
0

With the need of fast retrieval speed and small memory footprint, document hashing has been playing a crucial role in large-scale information retrieval. To generate high-quality hashing code, both semantics and neighborhood information are crucial. However, most existing methods leverage only one of them or simply combine them via some intuitive criteria, lacking a theoretical principle to guide the integration process. In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model. To deal with the complicated correlations among documents, we further propose a tree-structured approximation method for learning. Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones. Extensive experimental results on three benchmark datasets show that our method achieves superior performance over state-of-the-art methods, demonstrating the effectiveness of the proposed model for simultaneously preserving semantic and neighborhood information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/29/2019

Document Hashing with Mixture-Prior Generative Models

Hashing is promising for large-scale information retrieval tasks thanks ...
research
06/16/2020

Generative Semantic Hashing Enhanced via Boltzmann Machines

Generative semantic hashing is a promising technique for large-scale inf...
research
06/03/2019

Unsupervised Neural Generative Semantic Hashing

Fast similarity search is a key component in large-scale information ret...
research
02/02/2019

Pairwise Teacher-Student Network for Semi-Supervised Hashing

Hashing method maps similar high-dimensional data to binary hashcodes wi...
research
01/11/2022

Structure with Semantics: Exploiting Document Relations for Retrieval

Retrieving relevant documents from a corpus is typically based on the se...
research
02/11/2023

Explaining text classifiers through progressive neighborhood approximation with realistic samples

The importance of neighborhood construction in local explanation methods...
research
09/07/2021

Refining BERT Embeddings for Document Hashing via Mutual Information Maximization

Existing unsupervised document hashing methods are mostly established on...

Please sign up or login with your details

Forgot password? Click here to reset