Locality-Preserving Minimal Perfect Hashing of k-mers

by   Giulio Ermanno Pibiri, et al.

Minimal perfect hashing is the problem of mapping a static set of n distinct keys into the address space {1,…,n} bijectively. It is well-known that nlog_2 e bits are necessary to specify a minimal perfect hash function f, when no additional knowledge of the input keys is to be used. However, it is often the case in practice that the input keys have intrinsic relationships that we can exploit to lower the bit complexity of f. For example, consider a string and the set of all its distinct sub-strings of length k - the so-called k-mers of the string. Two consecutive k-mers in the string have a strong intrinsic relationship in that they share an overlap of k-1 symbols. Hence, it seems intuitively possible to beat the classic log_2 e bits/key barrier in this case. Moreover, we would like f to map consecutive k-mers to consecutive addresses, as to preserve as much as possible the relationships between the keys also in the co-domain {1,…,n}. This is a useful feature in practice as it guarantees a certain degree of locality of reference for f, resulting in a better evaluation time when querying consecutive k-mers from a string. Motivated by these premises, we initiate the study of a new type of locality-preserving minimal perfect hash functions designed for k-mers extracted consecutively from a string (or collections of strings). We show a theoretic lower bound on the bit complexity of any (1-ε)-locality-preserving MPHF, for a parameter 0 < ε < 1. The complexity is lower than nlog_2 e bits for sufficiently small ε. We propose a construction that approaches the theoretic minimum space for growing k and present a practical implementation of the method.


page 1

page 2

page 3

page 4


RecSplit: Minimal Perfect Hashing via Recursive Splitting

A minimal perfect hash function bijectively maps a key set S out of a un...

PTHash: Revisiting FCH Minimal Perfect Hashing

Given a set S of n distinct keys, a function f that bijectively maps the...

Parallel and External-Memory Construction of Minimal Perfect Hash Functions with PTHash

A minimal perfect hash function f for a set S of n keys is a bijective f...

Tight Bounds for Monotone Minimal Perfect Hashing

The monotone minimal perfect hash function (MMPHF) problem is the follow...

Learned Monotone Minimal Perfect Hashing

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S ...

High Performance Construction of RecSplit Based Minimal Perfect Hash Functions

A minimal perfect hash function (MPHF) is a bijection from a set of obje...

Practical Hash-based Anonymity for MAC Addresses

Given that a MAC address can uniquely identify a person or a vehicle, co...

Please sign up or login with your details

Forgot password? Click here to reset