Near-Optimal Distributed Band-Joins through Recursive Partitioning

04/13/2020
by   Rundong Li, et al.
0

We consider running-time optimization for band-joins in a distributed system, e.g., the cloud. To balance load across worker machines, input has to be partitioned, which causes duplication. We explore how to resolve this tension between maximum load per worker and input duplication for band-joins between two relations. Previous work suffered from high optimization cost or considered partitionings that were too restricted (resulting in suboptimal join performance). Our main insight is that recursive partitioning of the join-attribute space with the appropriate split scoring measure can achieve both low optimization cost and low join cost. It is the first approach that is not only effective for one-dimensional band-joins but also for joins on multiple attributes. Experiments indicate that our method is able to find partitionings that are within 10 worker and input duplication for a broad range of settings, significantly improving over previous work.

READ FULL TEXT
research
08/30/2019

Parallel In-Memory Evaluation of Spatial Joins

The spatial join is a popular operation in spatial database systems and ...
research
11/30/2020

A Near-Optimal Parallel Algorithm for Joining Binary Relations

We present a constant-round algorithm in the massively parallel computat...
research
05/15/2019

Improving Distributed Similarity Join in Metric Space with Error-bounded Sampling

Given two sets of objects, metric similarity join finds all similar pair...
research
02/01/2022

Recursive Multi-Section on the Fly: Shared-Memory Streaming Algorithms for Hierarchical Graph Partitioning and Process Mapping

Partitioning a graph into balanced blocks such that few edges run betwee...
research
08/05/2022

Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach

As one of the most useful online processing techniques, the theta-join o...
research
02/18/2021

Optimal Spectrum Partitioning and Licensing in Tiered Access under Stochastic Market Models

We consider the problem of partitioning a spectrum band into M channels ...
research
11/03/2020

Distributing Sparse Matrix/Graph Applications in Heterogeneous Clusters – an Experimental Study

Many problems in scientific and engineering applications contain sparse ...

Please sign up or login with your details

Forgot password? Click here to reset