Understanding the Moments of Tabulation Hashing via Chaoses

by   Jakob Bæk Tejs Houen, et al.

Simple tabulation hashing dates back to Zobrist in 1970 and is defined as follows: Each key is viewed as c characters from some alphabet Σ, we have c fully random hash functions h_0, …, h_c - 1Σ→{0, …, 2^l - 1}, and a key x = (x_0, …, x_c - 1) is hashed to h(x) = h_0(x_0) ⊕…⊕ h_c - 1(x_c - 1) where ⊕ is the bitwise XOR operation. The previous results on tabulation hashing by Pǎtraşcu and Thorup [J.ACM'11] and by Aamand et al. [STOC'20] focused on proving Chernoff-style tail bounds on hash-based sums, e.g., the number keys hashing to a given value, for simple tabulation hashing, but their bounds do not cover the entire tail. Chaoses are random variables of the form ∑ a_i_0, …, i_c - 1 X_i_0·…· X_i_c - 1 where X_i are independent random variables. Chaoses are a well-studied concept from probability theory, and tight analysis has been proven in several instances, e.g., when the independent random variables are standard Gaussian variables and when the independent random variables have logarithmically convex tails. We notice that hash-based sums of simple tabulation hashing can be seen as a sum of chaoses that are not independent. This motivates us to use techniques from the theory of chaoses to analyze hash-based sums of simple tabulation hashing. In this paper, we obtain bounds for all the moments of hash-based sums for simple tabulation hashing which are tight up to constants depending only on c. In contrast with the previous attempts, our approach will mostly be analytical and does not employ intricate combinatorial arguments. The improved analysis of simple tabulation hashing allows us to obtain bounds for the moments of hash-based sums for the mixed tabulation hashing introduced by Dahlgaard et al. [FOCS'15].


page 1

page 2

page 3

page 4


On Asymptotically Tight Tail Bounds for Sums of Geometric and Exponential Random Variables

In this note we prove bounds on the upper and lower probability tails of...

Practical Hash Functions for Similarity Estimation and Dimensionality Reduction

Hashing is a basic tool for dimensionality reduction employed in several...

Power of d Choices with Simple Tabulation

Suppose that we are to place m balls into n bins sequentially using the ...

No Repetition: Fast Streaming with Highly Concentrated Hashing

To get estimators that work within a certain error bound with high proba...

Fully Understanding the Hashing Trick

Feature hashing, also known as the hashing trick, introduced by Weinber...

Fast hashing with Strong Concentration Bounds

Previous work on tabulation hashing of Pǎtraşcu and Thorup from STOC'11 ...

Understanding Sparse JL for Feature Hashing

Feature hashing and more general projection schemes are commonly used in...

Please sign up or login with your details

Forgot password? Click here to reset