DeepAI AI Chat
Log In Sign Up

The Case for Learned In-Memory Joins

by   Ibrahim Sabek, et al.

In-memory join is an essential operator in any database engine. It has been extensively investigated in the database literature. In this paper, we study whether exploiting the CDF-based learned models to boost the join performance is practical or not. To the best of our knowledge, we are the first to fill this gap. We investigate the usage of CDF-based partitioning and learned indexes (e.g., Recursive Model Indexes (RMI) and RadixSpline) in the three join categories; indexed nested loop join (INLJ), sort-based joins (SJ) and hash-based joins (HJ). Our study shows that there is a room to improve the performance of INLJ and SJ categories through our proposed optimized learned variants. Our experimental analysis showed that these proposed learned variants of INLJ and SJ consistently outperform the state-of-the-art techniques.


Parallel In-Memory Evaluation of Spatial Joins

The spatial join is a popular operation in spatial database systems and ...

Design Trade-offs for a Robust Dynamic Hybrid Hash Join (Extended Version)

The Join operator, as one of the most expensive and commonly used operat...

Efficiently Charting RDF

We propose a visual query language for interactively exploring large-sca...

Non-recursive Approach for Sort-Merge Join Operation

Several algorithms have been developed over the years to perform join op...

Checkpointing and Localized Recovery for Nested Fork-Join Programs

While checkpointing is typically combined with a restart of the whole ap...

Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries

This paper presents predicate transfer, a novel method that optimizes jo...

Measuring and Predicting the Quality of a Join for Data Discovery

We study the problem of discovering joinable datasets at scale. We appro...