ShockHash: Towards Optimal-Space Minimal Perfect Hashing Beyond Brute-Force

08/18/2023
by   Hans-Peter Lehmann, et al.
0

A minimal perfect hash function (MPHF) maps a set S of n keys to the first n integers without collisions. There is a lower bound of nlog_2e-O(log n) bits of space needed to represent an MPHF. A matching upper bound is obtained using the brute-force algorithm that tries random hash functions until stumbling on an MPHF and stores that function's seed. In expectation, e^npoly(n) seeds need to be tested. The most space-efficient previous algorithms for constructing MPHFs all use such a brute-force approach as a basic building block. In this paper, we introduce ShockHash - Small, heavily overloaded cuckoo hash tables. ShockHash uses two hash functions h_0 and h_1, hoping for the existence of a function f : S →{0,1} such that x ↦ h_f(x)(x) is an MPHF on S. In graph terminology, ShockHash generates n-edge random graphs until stumbling on a pseudoforest - a graph where each component contains as many edges as nodes. Using cuckoo hashing, ShockHash then derives an MPHF from the pseudoforest in linear time. It uses a 1-bit retrieval data structure to store f using n + o(n) bits. By carefully analyzing the probability that a random graph is a pseudoforest, we show that ShockHash needs to try only (e/2)^npoly(n) hash function seeds in expectation, reducing the space for storing the seed by roughly n bits. This makes ShockHash almost a factor 2^n faster than brute-force, while maintaining the asymptotically optimal space consumption. An implementation within the RecSplit framework yields the currently most space efficient MPHFs, i.e., competing approaches need about two orders of magnitude more work to achieve the same space.

READ FULL TEXT
research
12/19/2022

High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

A minimal perfect hash function (MPHF) is a bijection from a set of obje...
research
10/04/2022

SicHash – Small Irregular Cuckoo Tables for Perfect Hashing

A Perfect Hash Function (PHF) is a hash function that has no collisions ...
research
07/21/2022

Tight Bounds for Monotone Minimal Perfect Hashing

The monotone minimal perfect hash function (MMPHF) problem is the follow...
research
02/06/2023

Storing a Trie with Compact and Predictable Space

This paper proposed a storing approach for trie structures, called coord...
research
07/10/2019

Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications

In this paper we identify a new class of sparse near-quadratic random Bo...
research
05/13/2020

Practical Hash-based Anonymity for MAC Addresses

Given that a MAC address can uniquely identify a person or a vehicle, co...
research
06/13/2023

Invertible Bloom Lookup Tables with Less Memory and Randomness

In this work we study Invertible Bloom Lookup Tables (IBLTs) with small ...

Please sign up or login with your details

Forgot password? Click here to reset