Learned Monotone Minimal Perfect Hashing

04/21/2023
by   Paolo Ferragina, et al.
0

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms. In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 35 achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage. We also investigate the case of keys being variable-length strings, introducing the so-called LeMonHash-VL: it needs space within 10 competitors while achieving up to 3 times faster queries.

READ FULL TEXT

page 1

page 14

research
04/21/2021

PTHash: Revisiting FCH Minimal Perfect Hashing

Given a set S of n distinct keys, a function f that bijectively maps the...
research
06/04/2021

Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash

A minimal perfect hash function f for a set S of n keys is a bijective f...
research
07/21/2022

Tight Bounds for Monotone Minimal Perfect Hashing

The monotone minimal perfect hash function (MMPHF) problem is the follow...
research
10/14/2019

RecSplit: Minimal Perfect Hashing via Recursive Splitting

A minimal perfect hash function bijectively maps a key set S out of a un...
research
10/24/2022

Locality-Preserving Minimal Perfect Hashing of k-mers

Minimal perfect hashing is the problem of mapping a static set of n dist...
research
11/07/2022

Simple Set Sketching

Imagine handling collisions in a hash table by storing, in each cell, th...
research
03/03/2023

Mapping Wordnets on the Fly with Permanent Sense Keys

Most of the major databases on the semantic web have links to Princeton ...

Please sign up or login with your details

Forgot password? Click here to reset