Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering

07/02/2021
by   Michael A. Bender, et al.
0

First introduced in 1954, linear probing is one of the oldest data structures in computer science, and due to its unrivaled data locality, it continues to be one of the fastest hash tables in practice. It is widely believed and taught, however, that linear probing should never be used at high load factors; this is because primary-clustering effects cause insertions at load factor 1 - 1 /x to take expected time Θ(x^2) (rather than the ideal Θ(x)). The dangers of primary clustering, first discovered by Knuth in 1963, have been taught to generations of computer scientists, and have influenced the design of some of many widely used hash tables. We show that primary clustering is not a foregone conclusion. We demonstrate that small design decisions in how deletions are implemented have dramatic effects on the asymptotic performance of insertions, so that, even if a hash table operates continuously at a load factor 1 - Θ(1/x), the expected amortized cost per operation is Õ(x). This is because tombstones created by deletions actually cause an anti-clustering effect that combats primary clustering. We also present a new variant of linear probing (which we call graveyard hashing) that completely eliminates primary clustering on any sequence of operations: if, when an operation is performed, the current load factor is 1 - 1/x for some x, then the expected cost of the operation is O(x). One corollary is that, in the external-memory model with a data blocks of size B, graveyard hashing offers the following remarkable guarantee: at any load factor 1 - 1/x satisfying x = o(B), graveyard hashing achieves 1 + o(1) expected block transfers per operation. Past external-memory hash tables have only been able to offer a 1 + o(1) guarantee when the block size B is at least Ω(x^2).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

Insertion Time of Random Walk Cuckoo Hashing below the Peeling Threshold

When it comes to hash tables, the only truly respectable insertion time ...
research
06/01/2020

DHash: Enabling Dynamic and Efficient Hash Tables

Given a specified average load factor, hash tables offer the appeal of c...
research
03/16/2020

Dash: Scalable Hashing on Persistent Memory

Byte-addressable persistent memory (PM) brings hash tables the potential...
research
05/10/2022

PaCHash: Packed and Compressed Hash Tables

We introduce PaCHash, a hash table that stores its objects contiguously ...
research
09/16/2020

WarpCore: A Library for fast Hash Tables on GPUs

Hash tables are ubiquitous. Properties such as an amortized constant tim...
research
09/11/2023

Two-way Linear Probing Revisited

We introduce linear probing hashing schemes that construct a hash table ...
research
06/05/2023

Large-Scale Distributed Learning via Private On-Device Locality-Sensitive Hashing

Locality-sensitive hashing (LSH) based frameworks have been used efficie...

Please sign up or login with your details

Forgot password? Click here to reset