Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems

10/01/2020
by   Christina Pavlopoulou, et al.
0

Query Optimization remains an open problem for Big Data Management Systems. Traditional optimizers are cost-based and use statistical estimates of intermediate result cardinalities to assign costs and pick the best plan. However, such estimates tend to become less accurate because of filtering conditions caused either from undetected correlations between multiple predicates local to a single dataset, predicates with query parameters, or predicates involving user-defined functions (UDFs). Consequently, traditional query optimizers tend to ignore or miscalculate those settings, thus leading to suboptimal execution plans. Given the volume of today's data, a suboptimal plan can quickly become very inefficient. In this work, we revisit the old idea of runtime dynamic optimization and adapt it to a shared-nothing distributed database system, AsterixDB. The optimization runs in stages (re-optimization points), starting by first executing all predicates local to a single dataset. The intermediate result created from each stage is used to re-optimize the remaining query. This re-optimization approach avoids inaccurate intermediate result cardinality estimations, thus leading to much better execution plans. While it introduces the overhead for materializing these intermediate results, our experiments show that this overhead is relatively small and it is an acceptable price to pay given the optimization benefits. In fact, our experimental evaluation shows that runtime dynamic optimization leads to much better execution plans as compared to the current default AsterixDB plans as well as to plans produced by static cost-based optimization (i.e. based on the initial dataset statistics) and other state-of-the-art approaches.

READ FULL TEXT
research
11/22/2017

Adaptive Cardinality Estimation

In this paper we address cardinality estimation problem which is an impo...
research
02/04/2021

Online Sketch-based Query Optimization

Cost-based query optimization remains a critical task in relational data...
research
06/11/2023

Kepler: Robust Learning for Faster Parametric Query Optimization

Most existing parametric query optimization (PQO) techniques rely on tra...
research
06/15/2019

Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems

Modern big data systems run on cloud environments where resources are sh...
research
04/10/2023

COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations

Query optimization is a pivotal part of every database management system...
research
01/06/2019

Exact Selectivity Computation for Modern In-Memory Database Query Optimization

Selectivity estimation remains a critical task in query optimization eve...
research
03/20/2023

Less is More: Towards Lightweight Cost Estimator for Database Systems

We present FasCo, a simple yet effective learning-based estimator for th...

Please sign up or login with your details

Forgot password? Click here to reset