BayesCard: A Unified Bayesian Framework for Cardinality Estimation

12/29/2020
by   Ziniu Wu, et al.
0

Cardinality estimation is one of the fundamental problems in database management systems and it is an essential component in query optimizers. Traditional machine-learning-based approaches use probabilistic models such as Bayesian Networks (BNs) to learn joint distributions on data. Recent research advocates for using deep unsupervised learning and achieves state-of-the-art performance in estimating the cardinality of selection and join queries. Yet the lack of scalability, stability and interpretability of such deep learning models, makes them unsuitable for real-world databases. Recent advances in probabilistic programming languages (PPLs) allow for a declarative and efficient specification of probabilistic models such as BNs, and achieve state-of-the-art accuracy in various machine learning tasks. In this paper, we present BayesCard, the first framework incorporating the techniques behind PPLs for building BNs along with relational extensions that can accurately estimate the cardinality of selection and join queries in database systems with model sizes that are up to three orders of magnitude smaller than deep models'. Furthermore, the more stable performance and better interpretation of BNs make them viable options for practical query optimizers. Our experimental results on several single-relation and multi-relation databases indicate that BayesCard with a reasonable estimation time has a better estimation accuracy than deep learning models, and has from one to two orders of magnitude less training cost nevertheless.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2022

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Cardinality estimation is one of the most fundamental and challenging pr...
research
08/21/2019

Improved Cardinality Estimation by Learning Queries Containment Rates

The containment rate of query Q1 in query Q2 over database D is the perc...
research
03/24/2019

Multi-Attribute Selectivity Estimation Using Deep Learning

Selectivity estimation - the problem of estimating the result size of qu...
research
03/20/2023

Less is More: Towards Lightweight Cost Estimator for Database Systems

We present FasCo, a simple yet effective learning-based estimator for th...
research
12/18/2018

DeepLens: Towards a Visual Data Management System

Advances in deep learning have greatly widened the scope of automatic co...
research
10/25/2022

A Database of Ultrastable MOFs Reassembled from Stable Fragments with Machine Learning Models

High-throughput screening of large hypothetical databases of metal-organ...
research
04/10/2023

COOOL: A Learning-To-Rank Approach for SQL Hint Recommendations

Query optimization is a pivotal part of every database management system...

Please sign up or login with your details

Forgot password? Click here to reset