Short Text Hashing Improved by Integrating Multi-Granularity Topics and Tags

03/10/2015
by   Jiaming Xu, et al.
0

Due to computational and storage efficiencies of compact binary codes, hashing has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to utilize latent topics of certain granularity to preserve semantic similarity in hash codes beyond keyword matching. However, topics of certain granularity are not adequate to represent the intrinsic semantic information. In this paper, we present a novel unified approach for short text Hashing using Multi-granularity Topics and Tags, dubbed HMTT. In particular, we propose a selection method to choose the optimal multi-granularity topics depending on the type of dataset, and design two distinct hashing strategies to incorporate multi-granularity topics. We also propose a simple and effective method to exploit tags to enhance the similarity of related texts. We carry out extensive experiments on one short text dataset as well as on one normal text dataset. The results demonstrate that our approach is effective and significantly outperforms baselines on several evaluation metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2021

Unsupervised Multi-Index Semantic Hashing

Semantic hashing represents documents as compact binary vectors (hash co...
research
04/24/2020

Reinforcing Short-Length Hashing

Due to the compelling efficiency in retrieval and storage, similarity-pr...
research
04/25/2016

Scalable Gaussian Processes for Supervised Hashing

We propose a flexible procedure for large-scale image search by hash fun...
research
10/01/2018

Fusion Hashing: A General Framework for Self-improvement of Hashing

Hashing has been widely used for efficient similarity search based on it...
research
09/25/2020

Adaptive Online Multi-modal Hashing via Hadamard Matrix

Hashing plays an important role in information retrieval, due to its low...
research
05/09/2019

DistillHash: Unsupervised Deep Hashing by Distilling Data Pairs

Due to the high storage and search efficiency, hashing has become preval...
research
09/06/2015

Sampled Weighted Min-Hashing for Large-Scale Topic Mining

We present Sampled Weighted Min-Hashing (SWMH), a randomized approach to...

Please sign up or login with your details

Forgot password? Click here to reset