SieveJoin: Boosting Multi-Way Joins with Reusable Bloom Filters

08/31/2023
by   Qingzhi Ma, et al.
0

Improving data systems' performance for join operations has long been an issue of great importance. More recently, a lot of focus has been devoted to multi-way join performance and especially on reducing the negative impact of producing intermediate tuples, which in the end do not make it in the final result. We contribute a new multi-way join algorithm, coined SieveJoin, which extends the well-known Bloomjoin algorithm to multi-way joins and achieves state-of-the-art performance in terms of join query execution efficiency. SieveJoin's salient novel feature is that it allows the propagation of Bloom filters in the join path, enabling the system to `stop early' and eliminate useless intermediate join results. The key design objective of SieveJoin is to efficiently `learn' the join results, based on Bloom filters, with negligible memory overheads. We discuss the bottlenecks in delaying multi-way joins, and how Bloom filters are used to remove the generation of unnecessary intermediate join results. We provide a detailed experimental evaluation using various datasets, against a state-of-the-art column-store database and a multi-way worst-case optimal join algorithm, showcasing SieveJoin's gains in terms of response time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2019

Worst-Case Optimal Radix Triejoin

Relatively recently, the field of join processing has been swayed by the...
research
06/21/2022

Graphical Join: A New Physical Join Algorithm for RDBMSs

Join operations (especially n-way, many-to-many joins) are known to be t...
research
06/13/2019

Memory-Efficient Group-by Aggregates over Multi-Way Joins

Aggregate computation in relational databases has long been done using t...
research
12/26/2014

Unsupervised Learning through Prediction in a Model of Cortex

We propose a primitive called PJOIN, for "predictive join," which combin...
research
12/04/2020

Hiperfact: In-Memory High Performance Fact Processing – Rethinking the Rete Inference Algorithm

The Rete forward inference algorithm forms the basis for many rule engin...
research
06/08/2017

Optimal parameters for bloom-filtered joins in Spark

In this paper, we present an algorithm that joins relational database ta...
research
11/27/2018

Efficiently Charting RDF

We propose a visual query language for interactively exploring large-sca...

Please sign up or login with your details

Forgot password? Click here to reset