Maximally Consistent Sampling and the Jaccard Index of Probability Distributions

09/11/2018
by   Ryan Moulton, et al.
0

We introduce simple, efficient algorithms for computing a MinHash of a probability distribution, suitable for both sparse and dense data, with equivalent running times to the state of the art for both cases. The collision probability of these algorithms is a new measure of the similarity of positive vectors which we investigate in detail. We describe the sense in which this collision probability is optimal for any Locality Sensitive Hash based on sampling. We argue that this similarity measure is more useful for probability distributions than the similarity pursued by other algorithms for weighted MinHash, and is the natural generalization of the Jaccard index.

READ FULL TEXT

page 4

page 6

research
11/02/2019

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

The probability Jaccard similarity was recently proposed as a natural ge...
research
01/28/2022

Two more ways of spelling Gini Coefficient with Applications

In this paper, we draw attention to a promising yet slightly underestima...
research
11/12/2020

A partition-based similarity for classification distributions

Herein we define a measure of similarity between classification distribu...
research
08/14/2021

Probability Distributions for Elliptic Curves in the CGL Hash Function

Hash functions map data of arbitrary length to data of predetermined len...
research
01/02/2019

Massively Parallel Construction of Radix Tree Forests for the Efficient Sampling of Discrete Probability Distributions

We compare different methods for sampling from discrete probability dist...
research
01/15/2019

The RGB No-Signalling Game

Introducing the simplest of all No-Signalling Games: the RGB Game where ...
research
07/21/2019

Quantifying Similarity between Relations with Fact Distribution

We introduce a conceptually simple and effective method to quantify the ...

Please sign up or login with your details

Forgot password? Click here to reset