Efficient Massively Parallel Join Optimization for Large Queries

02/28/2022
by   Riccardo Mancini, et al.
0

Modern data analytical workloads often need to run queries over a large number of tables. An optimal query plan for such queries is crucial for being able to run these queries within acceptable time bounds. However, with queries involving many tables, finding the optimal join order becomes a bottleneck in query optimization. Due to the exponential nature of join order optimization, optimizers resort to heuristic solutions after a threshold number of tables. Our objective is two fold: (a) reduce the optimization time for generating optimal plans; and (b) improve the quality of the heuristic solution. In this paper, we propose a new massively parallel algorithm, MPDP, that can efficiently prune the large search space (via a novel plan enumeration technique) while leveraging the massive parallelism offered by modern hardware (Eg: GPUs). When evaluated on real-world benchmark queries with PostgreSQL, MPDP is at least an order of magnitude faster compared to state-of-the-art techniques for large analytical queries. As a result, we are able to increase the heuristic-fall-back limit from 12 relations to 25 relations with same time budget in PostgreSQL. Also, in order to handle queries with even larger number of tables, we augment MPDP to a well known heuristic, IDP_2 (iterative DP version 2) and a novel heuristic UnionDP. By systematically exploring a much larger search space, these heuristics provides query plans that are up to 7 times cheaper as compared to the state-of-the-art techniques while being faster to compute.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2018

Learning to Optimize Join Queries With Deep Reinforcement Learning

Exhaustive enumeration of all possible join orders is often avoided, and...
research
05/07/2020

Bitvector-aware Query Optimization for Decision Support Queries (extended version)

Bitvector filtering is an important query processing technique that can ...
research
02/21/2019

How I Learned to Stop Worrying and Love Re-optimization

Cost-based query optimizers remain one of the most important components ...
research
01/25/2023

Free Join: Unifying Worst-Case Optimal and Traditional Joins

Over the last decade, worst-case optimal join (WCOJ) algorithms have eme...
research
10/30/2021

Simpli-Squared: A Very Simple Yet Unexpectedly Powerful Join Ordering Algorithm Without Cardinality Estimates

The Join Order Benchmark (JOB) has become the de facto standard to asses...
research
08/24/2021

Making RDBMSs Efficient on Graph Workloads Through Predefined Joins

Joins in native graph database management systems (GDBMSs) are predefine...
research
10/06/2020

Sharon: Shared Online Event Sequence Aggregation

Streaming systems evaluate massive workloads of event sequence aggregati...

Please sign up or login with your details

Forgot password? Click here to reset