Learning to Hash for Indexing Big Data - A Survey

09/17/2015
by   Jun Wang, et al.
0

The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the straightforward solution using exhaustive comparison is infeasible due to the prohibitive computational complexity and memory requirement. In response, Approximate Nearest Neighbor (ANN) search based on hashing techniques has become popular due to its promising performance in both efficiency and accuracy. Prior randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore data-independent hash functions with random projections or permutations. Although having elegant theoretic guarantees on the search quality in certain metric spaces, performance of randomized hashing has been shown insufficient in many real-world applications. As a remedy, new approaches incorporating data-driven learning methods in development of advanced hash functions have emerged. Such learning to hash methods exploit information such as data distributions or class labels when optimizing the hash codes or functions. Importantly, the learned hash codes are able to preserve the proximity of neighboring data in the original feature spaces in the hash code spaces. The goal of this paper is to provide readers with systematic understanding of insights, pros and cons of the emerging techniques. We provide a comprehensive survey of the learning to hash framework and representative techniques of various types, including unsupervised, semi-supervised, and supervised. In addition, we also summarize recent hashing approaches utilizing the deep learning models. Finally, we discuss the future direction and trends of research in this area.

READ FULL TEXT

page 10

page 11

research
08/13/2014

Hashing for Similarity Search: A Survey

Similarity search (nearest neighbor search) is a problem of pursuing the...
research
02/17/2021

A Survey on Locality Sensitive Hashing Algorithms and their Applications

Finding nearest neighbors in high-dimensional spaces is a fundamental op...
research
06/18/2012

Compact Hyperplane Hashing with Bilinear Functions

Hyperplane hashing aims at rapidly searching nearest points to a hyperpl...
research
11/02/2019

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

The probability Jaccard similarity was recently proposed as a natural ge...
research
08/11/2021

Learning to Hash Robustly, with Guarantees

The indexing algorithms for the high-dimensional nearest neighbor search...
research
12/01/2020

Scalable Data Discovery Using Profiles

We study the problem of discovering joinable datasets at scale. This is,...
research
06/05/2023

Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Locality-sensitive hashing (LSH) based frameworks have been used efficie...

Please sign up or login with your details

Forgot password? Click here to reset