Locality Sensitive Hashing for Structured Data: A Survey

04/24/2022
by   Wei Wu, et al.
0

Data similarity (or distance) computation is a fundamental research topic which fosters a variety of similarity-based machine learning and data mining applications. In big data analytics, it is impractical to compute the exact similarity of data instances due to high computational cost. To this end, the Locality Sensitive Hashing (LSH) technique has been proposed to provide accurate estimators for various similarity measures between sets or vectors in an efficient manner without the learning process. Structured data (e.g., sequences, trees and graphs), which are composed of elements and relations between the elements, are commonly seen in the real world, but the traditional LSH algorithms cannot preserve the structure information represented as relations between elements. In order to conquer the issue, researchers have been devoted to the family of the hierarchical LSH algorithms. In this paper, we explore the present progress of the research into hierarchical LSH from the following perspectives: 1) Data structures, where we review various hierarchical LSH algorithms for three typical data structures and uncover their inherent connections; 2) Applications, where we review the hierarchical LSH algorithms in multiple application scenarios; 3) Challenges, where we discuss some potential challenges as future directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2018

A Review for Weighted MinHash Algorithms

Data similarity (or distance) computation is a fundamental research topi...
research
10/31/2022

Using Locality-sensitive Hashing for Rendezvous Search

The multichannel rendezvous problem is a fundamental problem for neighbo...
research
01/26/2021

Sampling a Near Neighbor in High Dimensions – Who is the Fairest of Them All?

Similarity search is a fundamental algorithmic primitive, widely used in...
research
12/24/2017

Biological Systems as Heterogeneous Information Networks: A Mini-review and Perspectives

In the real world, most objects and data have multiple types of attribut...
research
08/30/2018

Hashing-Based-Estimators for Kernel Density in High Dimensions

Given a set of points P⊂R^d and a kernel k, the Kernel Density Estimate ...
research
11/02/2019

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

The probability Jaccard similarity was recently proposed as a natural ge...
research
10/13/2020

It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing

In recent, deep learning has become the most popular direction in machin...

Please sign up or login with your details

Forgot password? Click here to reset