Optimal Hashing in External Memory

05/23/2018
by   Alex Conway, et al.
0

Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Patracsu established an update/query tradeoff curve for external hash tables: a hash table that performs insertions in O(λ/B) amortized IOs requires Ω(_λ N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and λ is a tuning parameter. They provide a hashing data structure that meets this curve for λ that is Ω( M + _M N). Their data structure, which we call an IP hash table, is complicated and, to the best of our knowledge, has not been implemented. In this paper, we present a new and much simpler optimal external memory hash table, the Bundle of Arrays Hash Table (BOA). BOAs are based on size-tiered LSMs, a well-studied data structure, and are almost as easy to implement. The BOA is optimal for a narrower range of λ. However, the simplicity of BOAs allows them to be readily modified to achieve the following results: * A new external memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. * The cache-oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of λ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/27/2017

Cuckoo++ Hash Tables: High-Performance Hash Tables for Networking Applications

Hash tables are an essential data-structure for numerous networking appl...
research
08/05/2023

DiCuPIT: Distributed Cuckoo Filter-based Pending Interest Table

Named data networking is one of the recommended architectures for the fu...
research
05/10/2022

PaCHash: Packed and Compressed Hash Tables

We introduce PaCHash, a hash table that stores its objects contiguously ...
research
09/13/2018

A Self-Stabilizing Hashed Patricia Trie

While a lot of research in distributed computing has covered solutions f...
research
08/09/2021

Efficient algorithms for collecting the statistics of large-scale IP address data

Compiling the statistics of large-scale IP address data is an essential ...
research
08/12/2019

Two Dimensional Router: Design and Implementation

Higher dimensional classification has attracted more attentions with inc...
research
01/14/2019

Quotient Hash Tables - Efficiently Detecting Duplicates in Streaming Data

This article presents the Quotient Hash Table (QHT) a new data structure...

Please sign up or login with your details

Forgot password? Click here to reset