FactorJoin: A New Cardinality Estimation Framework for Join Queries

12/11/2022
by   Ziniu Wu, et al.
0

Cardinality estimation is one of the most fundamental and challenging problems in query optimization. Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries. They either rely on simplified assumptions leading to ineffective cardinality estimates or build large models to understand the data distributions, leading to long planning times and a lack of generalizability across queries. In this paper, we propose a new framework FactorJoin for estimating join queries. FactorJoin combines the idea behind the classical join-histogram method to efficiently handle joins with the learning-based methods to accurately capture attribute correlation. Specifically, FactorJoin scans every table in a DB and builds single-table conditional distributions during an offline preparation phase. When a join query comes, FactorJoin translates it into a factor graph model over the learned distributions to effectively and efficiently estimate its cardinality. Unlike existing learning-based methods, FactorJoin does not need to de-normalize joins upfront or require executed query workloads to train the model. Since it only relies on single-table statistics, FactorJoin has small space overhead and is extremely easy to train and maintain. In our evaluation, FactorJoin can produce more effective estimates than the previous state-of-the-art learning-based methods, with 40x less estimation latency, 100x smaller model size, and 100x faster training speed at comparable or better accuracy. In addition, FactorJoin can estimate 10,000 sub-plan queries within one second to optimize the query plan, which is very close to the traditional cardinality estimators in commercial DBMS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2021

Simpli-Squared: A Very Simple Yet Unexpectedly Powerful Join Ordering Algorithm Without Cardinality Estimates

The Join Order Benchmark (JOB) has become the de facto standard to asses...
research
12/29/2020

BayesCard: A Unified Bayesian Framework for Cardinality Estimation

Cardinality estimation is one of the fundamental problems in database ma...
research
05/19/2020

Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

Cardinality estimation is a fundamental task in database query processin...
research
12/26/2018

QuickSel: Quick Selectivity Learning with Mixture Models

Estimating the selectivity of a query is a key step in almost any cost-b...
research
05/19/2021

Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

We study two classes of summary-based cardinality estimators that use st...
research
06/06/2019

An End-to-End Learning-based Cost Estimator

Cost and cardinality estimation is vital to query optimizer, which can g...
research
01/10/2023

Change Propagation Without Joins

We revisit the classical change propagation framework for query evaluati...

Please sign up or login with your details

Forgot password? Click here to reset