RecSplit: Minimal Perfect Hashing via Recursive Splitting

10/14/2019
by   Emmanuel Esposito, et al.
0

A minimal perfect hash function bijectively maps a key set S out of a universe U into the first |S| natural numbers. Minimal perfect hash functions are used, for example, to map irregularly-shaped keys, such as string, in a compact space so that metadata can then be simply stored in an array. While it is known that just 1.44 bits per key are necessary to store a minimal perfect function, no published technique can go below 2 bits per key in practice. We propose a new technique for storing minimal perfect hash functions with expected linear construction time and expected constant lookup time that makes it possible to build for the first time, for example, structures which need 1.56 bits per key, that is, within 8.3 bound, in less than 2 ms per key. We show that instances of our construction are able to simultaneously beat the construction time, space usage and lookup time of the state-of-the-art data structure reaching 2 bits per key. Moreover, we provide parameter choices giving structures which are competitive with alternative, larger-size data structures in terms of space and lookup time. The construction of our data structures can be easily parallelized or mapped on distributed computational units (e.g., within the MapReduce framework), and structures larger than the available RAM can be directly built in mass storage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2021

Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash

A minimal perfect hash function f for a set S of n keys is a bijective f...
research
10/24/2022

Locality-Preserving Minimal Perfect Hashing of k-mers

Minimal perfect hashing is the problem of mapping a static set of n dist...
research
11/22/2019

Constructing Minimal Perfect Hash Functions Using SAT Technology

Minimal perfect hash functions (MPHFs) are used to provide efficient acc...
research
01/28/2020

Peeling Close to the Orientability Threshold: Spatial Coupling in Hashing-Based Data Structures

Hypergraphs with random hyperedges underlie various data structures wher...
research
05/13/2020

Practical Hash-based Anonymity for MAC Addresses

Given that a MAC address can uniquely identify a person or a vehicle, co...
research
04/21/2021

PTHash: Revisiting FCH Minimal Perfect Hashing

Given a set S of n distinct keys, a function f that bijectively maps the...
research
04/21/2023

Learned Monotone Minimal Perfect Hashing

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S ...

Please sign up or login with your details

Forgot password? Click here to reset