Exploiting Data Skew for Improved Query Performance

10/22/2019
by   Wangda Zhang, et al.
0

Analytic queries enable sophisticated large-scale data analysis within many commercial, scientific and medical domains today. Data skew is a ubiquitous feature of these real-world domains. In a retail database, some products are typically much more popular than others. In a text database, word frequencies follow a Zipf distribution with a small number of very common words, and a long tail of infrequent words. In a geographic database, some regions have much higher populations (and data measurements) than others. Current systems do not make the most of caches for exploiting skew. In particular, a whole cache line may remain cache resident even though only a small part of the cache line corresponds to a popular data item. In this paper, we propose a novel index structure for repositioning data items to concentrate popular items into the same cache lines. The net result is better spatial locality, and better utilization of limited cache resources. We develop a theoretical model for analyzing the cache behavior, and implement database operators that are efficient in the presence of skew. Our experiments on real and synthetic data show that exploiting skew can significantly improve in-memory query performance. In some cases, our techniques can speed up queries by over an order of magnitude.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2023

BackCache: Mitigating Contention-Based Cache Timing Attacks by Hiding Cache Line Evictions

Caches are used to reduce the speed differential between the CPU and mem...
research
01/09/2020

Topical Result Caching in Web Search Engines

Caching search results is employed in information retrieval systems to e...
research
12/22/2022

Reinforcement Learning Based Approaches to Adaptive Context Caching in Distributed Context Management Systems

Performance metrics-driven context caching has a profound impact on thro...
research
06/29/2020

An Imitation Learning Approach for Cache Replacement

Program execution speed critically depends on increasing cache hits, as ...
research
11/25/2022

Caching Historical Embeddings in Conversational Search

Rapid response, namely low latency, is fundamental in search application...
research
05/30/2022

Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever

Recommender retrievers aim to rapidly retrieve a fraction of items from ...

Please sign up or login with your details

Forgot password? Click here to reset