Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash

06/04/2021
by   Giulio Ermanno Pibiri, et al.
0

A minimal perfect hash function f for a set S of n keys is a bijective function of the form f : S →{0,…,n-1}. These functions are important for many practical applications in computing, such as search engines, computer networks, and databases. Several algorithms have been proposed to build minimal perfect hash functions that: scale well to large sets, retain fast evaluation time, and take very little space, e.g., 2 - 3 bits/key. PTHash is one such algorithm, achieving very fast evaluation in compressed space, typically several times faster than other techniques. In this work, we propose a new construction algorithm for PTHash enabling: (1) multi-threading, to either build functions more quickly or more space-efficiently, and (2) external-memory processing to scale to inputs much larger than the available internal memory. Only few other algorithms in the literature share these features, despite of their big practical impact. We conduct an extensive experimental assessment on large real-world string collections and show that, with respect to other techniques, PTHash is competitive in construction time and space consumption, but retains 2 - 6× better lookup time.

READ FULL TEXT
research
04/21/2021

PTHash: Revisiting FCH Minimal Perfect Hashing

Given a set S of n distinct keys, a function f that bijectively maps the...
research
10/14/2019

RecSplit: Minimal Perfect Hashing via Recursive Splitting

A minimal perfect hash function bijectively maps a key set S out of a un...
research
04/21/2023

Learned Monotone Minimal Perfect Hashing

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S ...
research
11/22/2019

Constructing Minimal Perfect Hash Functions Using SAT Technology

Minimal perfect hash functions (MPHFs) are used to provide efficient acc...
research
05/26/2023

CARAMEL: A Succinct Read-Only Lookup Table via Compressed Static Functions

Lookup tables are a fundamental structure in many data processing and sy...
research
10/24/2022

Locality-Preserving Minimal Perfect Hashing of k-mers

Minimal perfect hashing is the problem of mapping a static set of n dist...
research
12/19/2022

High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

A minimal perfect hash function (MPHF) is a bijection from a set of obje...

Please sign up or login with your details

Forgot password? Click here to reset