Instance and Output Optimal Parallel Algorithms for Acyclic Joins

03/22/2019
by   Xiao Hu, et al.
0

Massively parallel join algorithms have received much attention in recent years, while most prior work has focused on worst-optimal algorithms. However, the worst-case optimality of these join algorithms relies on hard instances having very large output sizes, which rarely appear in practice. A stronger notion of optimality is output-optimal, which requires an algorithm to be optimal within the class of all instances sharing the same input and output size. An even stronger optimality is instance-optimal, i.e., the algorithm is optimal on every single instance, but this may not always be achievable. In the traditional RAM model of computation, the classical Yannakakis algorithm is instance-optimal on any acyclic join. But in the massively parallel computation (MPC) model, the situation becomes much more complicated. We first show that for the class of r-hierarchical joins, instance-optimality can still be achieved in the MPC model. Then, we give a new MPC algorithm for an arbitrary acyclic join with load O ( p + √(· p) ), where , are the input and output sizes of the join, and p is the number of servers in the MPC model. This improves the MPC version of the Yannakakis algorithm by an O(√()) factor. Furthermore, we show that this is output-optimal when = O(p ·), for every acyclic but non-r-hierarchical join. Finally, we give the first output-sensitive lower bound for the triangle join in the MPC model, showing that it is inherently more difficult than acyclic joins.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2022

Parallel Acyclic Joins with Canonical Edge Covers

In PODS'21, Hu presented an algorithm in the massively parallel computat...
research
11/30/2020

A Near-Optimal Parallel Algorithm for Joining Binary Relations

We present a constant-round algorithm in the massively parallel computat...
research
12/15/2020

Instance Optimal Join Size Estimation

We consider the problem of efficiently estimating the size of the inner ...
research
08/20/2022

Safe Subjoins in Acyclic Joins

It is expensive to compute joins, often due to large intermediate relati...
research
05/01/2020

Optimal Join Algorithms Meet Top-k

Top-k queries have been studied intensively in the database community an...
research
10/26/2022

Quantifying the Loss of Acyclic Join Dependencies

Acyclic schemas possess known benefits for database design, speeding up ...
research
04/01/2022

Givens QR Decomposition over Relational Databases

This paper introduces Figaro, an algorithm for computing the upper-trian...

Please sign up or login with your details

Forgot password? Click here to reset