When Similarity Digest Meets Vector Management System: A Survey on Similarity Hash Function

09/18/2021
by   Zhushou Tang, et al.
0

The booming vector manage system calls for feasible similarity hash function as a front-end to perform similarity analysis. In this paper, we make a systematical survey on the existent well-known similarity hash functions to tease out the satisfied ones. We conclude that the similarity hash function MinHash and Nilsimsa can be directly marshaled into the pipeline of similarity analysis using vector manage system. After that, we make a brief and empirical discussion on the performance, drawbacks of the these functions and highlight MinHash, the variant of SimHash and feature hashing are the best for vector management system for large-scale similarity analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2014

Hashing for Similarity Search: A Survey

Similarity search (nearest neighbor search) is a problem of pursuing the...
research
11/02/2011

Kernel diff-hash

This paper presents a kernel formulation of the recently introduced diff...
research
04/06/2017

Online Hashing

Although hash function learning algorithms have achieved great success i...
research
04/26/2018

Dialogue Modeling Via Hash Functions: Applications to Psychotherapy

We propose a novel machine-learning framework for dialogue modeling whic...
research
05/04/2023

A Sparse Johnson-Lindenstrauss Transform using Fast Hashing

The Sparse Johnson-Lindenstrauss Transform of Kane and Nelson (SODA 2012...
research
12/17/2018

Fuzzy Hashing as Perturbation-Consistent Adversarial Kernel Embedding

Measuring the similarity of two files is an important task in malware an...
research
06/01/2016

A Survey on Learning to Hash

Nearest neighbor search is a problem of finding the data points from the...

Please sign up or login with your details

Forgot password? Click here to reset