An Approach Based on Bayesian Networks for Query Selectivity Estimation

07/14/2019
by   Max Halford, et al.
0

The efficiency of a query execution plan depends on the accuracy of the selectivity estimates given to the query optimiser by the cost model. The cost model makes simplifying assumptions in order to produce said estimates in a timely manner. These assumptions lead to selectivity estimation errors that have dramatic effects on the quality of the resulting query execution plans. A convenient assumption that is ubiquitous among current cost models is to assume that attributes are independent with each other. However, it ignores potential correlations which can have a huge negative impact on the accuracy of the cost model. In this paper we attempt to relax the attribute value independence assumption without unreasonably deteriorating the accuracy of the cost model. We propose a novel approach based on a particular type of Bayesian networks called Chow-Liu trees to approximate the distribution of attribute values inside each relation of a database. Our results on the TPC-DS benchmark show that our method is an order of magnitude more precise than other approaches whilst remaining reasonably efficient in terms of time and space.

READ FULL TEXT
research
09/21/2020

Selectivity Estimation with Attribute Value Dependencies using Linked Bayesian Networks

Relational query optimisers rely on cost models to choose between differ...
research
03/29/2019

Query the model: precomputations for efficient inference with Bayesian Networks

We consider a setting where a Bayesian network has been built over a rel...
research
02/04/2021

Online Sketch-based Query Optimization

Cost-based query optimization remains a critical task in relational data...
research
12/20/2022

Approximate Query Processing via Tuple Bubbles

We propose a versatile approach to lightweight, approximate query proces...
research
05/02/2019

Can the Optimizer Cost be Used to Predict Query Execution Times?

Predicting the execution time of queries is an important problem with ap...
research
03/20/2023

Less is More: Towards Lightweight Cost Estimator for Database Systems

We present FasCo, a simple yet effective learning-based estimator for th...
research
03/06/2013

A Construction of Bayesian Networks from Databases Based on an MDL Principle

This paper addresses learning stochastic rules especially on an inter-at...

Please sign up or login with your details

Forgot password? Click here to reset