BagMinHash - Minwise Hashing Algorithm for Weighted Sets

02/12/2018
by   Otmar Ertl, et al.
0

Minwise hashing has become a standard tool to calculate signatures which allow direct estimation of Jaccard similarities. While very efficient algorithms already exist for the unweighted case, the calculation of signatures for weighted sets is still a time consuming task. BagMinHash is a new algorithm that can be orders of magnitude faster than current state of the art without any particular restrictions or assumptions on weights or data dimensionality. Applied to the special case of unweighted sets, it represents the first efficient algorithm producing independent signature components. A series of tests finally verifies the new algorithm and also reveals limitations of other approaches published in the recent past.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2019

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

The probability Jaccard similarity was recently proposed as a natural ge...
research
05/23/2020

DartMinHash: Fast Sketching for Weighted Sets

Weighted minwise hashing is a standard dimensionality reduction techniqu...
research
08/20/2018

Faster Support Vector Machines

The time complexity of support vector machines (SVMs) prohibits training...
research
04/11/2023

A New Algorithm to determine Adomian Polynomials for nonlinear polynomial functions

We present a new algorithm by which the Adomian polynomials can be deter...
research
04/14/2020

Topology-Aware Hashing for Effective Control Flow Graph Similarity Analysis

Control Flow Graph (CFG) similarity analysis is an essential technique f...
research
05/17/2016

Efficient Algorithms for Mixed Creative Telescoping

Creative telescoping is a powerful computer algebra paradigm -initiated ...
research
02/06/2018

Resilient Blocks for Summarising Distributed Data

Summarising distributed data is a central routine for parallel programmi...

Please sign up or login with your details

Forgot password? Click here to reset