VIP Hashing – Adapting to Skew in Popularity of Data on the Fly (extended version)

06/24/2022
by   Aarati Kakaraparthy, et al.
0

All data is not equally popular. Often, some portion of data is more frequently accessed than the rest, which causes a skew in popularity of the data items. Adapting to this skew can improve performance, and this topic has been studied extensively in the past for disk-based settings. In this work, we consider an in-memory data structure, namely hash table, and show how one can leverage the skew in popularity for higher performance. Hashing is a low-latency operation, sensitive to the effects of caching, branch prediction, and code complexity among other factors. These factors make learning in-the-loop especially challenging as the overhead of performing any additional operations can be significant. In this paper, we propose VIP hashing, a fully online hash table method, that uses lightweight mechanisms for learning the skew in popularity and adapting the hash table layout. These mechanisms are non-blocking, and their overhead is controlled by sensing changes in the popularity distribution to dynamically switch-on/off the learning mechanism as needed. We tested VIP hashing against a variety of workloads generated by Wiscer, a homegrown hashing measurement tool, and find that it improves performance in the presence of skew (22 for a hash table with one million keys under low skew, 77 medium skew) while being robust to insert and delete operations, and changing popularity distribution of keys. We find that VIP hashing reduces the end-to-end execution time of TPC-H query 9, which is the most expensive TPC-H query, by 20

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2022

Hyperdimensional Hashing: A Robust and Efficient Dynamic Hash Table

Most cloud services and distributed applications rely on hashing algorit...
research
09/09/2021

All-Purpose Hashing

Despite being one of the oldest data structures in computer science, has...
research
01/31/2023

Bounds for c-Ideal Hashing

In this paper, we analyze hashing from a worst-case perspective. To this...
research
04/18/2023

Sliding Block Hashing (Slick) – Basic Algorithmic Ideas

We present Sliding Block Hashing (Slick), a simple hash table data struc...
research
11/10/2015

Online Supervised Hashing for Ever-Growing Datasets

Supervised hashing methods are widely-used for nearest neighbor search i...
research
09/13/2018

A Self-Stabilizing Hashed Patricia Trie

While a lot of research in distributed computing has covered solutions f...
research
05/24/2021

DynaHash: Efficient Data Rebalancing in Apache AsterixDB (Extended Version)

Parallel shared-nothing data management systems have been widely used to...

Please sign up or login with your details

Forgot password? Click here to reset