Fast hashing with Strong Concentration Bounds

05/01/2019
by   Anders Aamand, et al.
0

Previous work on tabulation hashing of Pǎtraşcu and Thorup from STOC'11 on simple tabulation and from SODA'13 on twisted tabulation offered Chernoff-style concentration bounds on hash based sums, but under some quite severe restrictions on the expected values of these sums. More precisely, the basic idea in tabulation hashing is to view a key as consisting of c=O(1) characters, e.g., a 64-bit key as c=8 characters of 8-bits. The character domain Σ should be small enough that character tables of size |Σ| fit in fast cache. The schemes then use O(1) tables of this size, so the space of tabulation hashing is O(|Σ|). However the above concentration bounds only apply if the expected sums are ≪ |Σ|. To see the problem, consider the very simple case where we use tabulation hashing to throw n balls into m bins and apply Chernoff bounds to the number of balls that land in a given bin. We are fine if n=m, for then the expected value is 1. However, if m=2 bins as when tossing n unbiased coins, then the expectancy n/2 is ≫ |Σ| for large data sets, e.g., data sets that don't fit in fast cache. To handle expectations that go beyond the limits of our small space, we need a much more advanced analysis of simple tabulation, plus a new tabulation technique that we call tabulation-permutation hashing which is at most twice as slow as simple tabulation. No other hashing scheme of comparable speed offers similar Chernoff-style concentration bounds.

READ FULL TEXT
research
10/31/2018

Non-Empty Bins with Simple Tabulation Hashing

We consider the hashing of a set X⊆ U with |X|=m using a simple tabulati...
research
05/03/2022

Understanding the Moments of Tabulation Hashing via Chaoses

Simple tabulation hashing dates back to Zobrist in 1970 and is defined a...
research
04/02/2020

No Repetition: Fast Streaming with Highly Concentrated Hashing

To get estimators that work within a certain error bound with high proba...
research
11/23/2017

Practical Hash Functions for Similarity Estimation and Dimensionality Reduction

Hashing is a basic tool for dimensionality reduction employed in several...
research
02/11/2022

Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold

When it comes to hash tables, the only truly respectable insertion time ...
research
11/08/2019

Lock-Free Hopscotch Hashing

In this paper we present a lock-free version of Hopscotch Hashing. Hopsc...
research
07/11/2018

Data-Parallel Hashing Techniques for GPU Architectures

Hash tables are one of the most fundamental data structures for effectiv...

Please sign up or login with your details

Forgot password? Click here to reset