Machine Learning-based Cardinality Estimation in DBMS on Pre-Aggregated Data

05/19/2020
by   Lucas Woltmann, et al.
0

Cardinality estimation is a fundamental task in database query processing and optimization. As shown in recent papers, machine learning (ML)-based approaches can deliver more accurate cardinality estimations than traditional approaches. However, a lot of example queries have to be executed during the model training phase to learn a data-dependent ML model leading to a very time-consuming training phase. Many of those example queries use the same base data, have the same query structure, and only differ in their predicates. Thus, index structures appear to be an ideal optimization technique at first glance. However, their benefit is limited. To speed up this model training phase, our core idea is to determine a predicate-independent pre-aggregation of the base data and to execute the example queries over this pre-aggregated data. Based on this idea, we present a specific aggregate-enabled training phase for ML-based cardinality estimation approaches in this paper. As we are going to show with different workloads in our evaluation, we are able to achieve an average speedup of 63 with our aggregate-enabled training phase.

READ FULL TEXT

page 3

page 4

page 6

research
12/11/2022

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Cardinality estimation is one of the most fundamental and challenging pr...
research
08/21/2019

Improved Cardinality Estimation by Learning Queries Containment Rates

The containment rate of query Q1 in query Q2 over database D is the perc...
research
11/20/2022

NeuroSketch: Fast and Approximate Evaluation of Range Aggregate Queries with Neural Networks

Range aggregate queries (RAQs) are an integral part of many real-world a...
research
04/15/2020

NN-based Transformation of Any SQL Cardinality Estimator for Handling DISTINCT, AND, OR and NOT

SQL queries, with the AND, OR, and NOT operators, constitute a broad cla...
research
01/29/2018

Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation

Estimating the cardinality (i.e., the number of answers) of conjunctive ...
research
05/28/2023

One stone, two birds: A lightweight multidimensional learned index with cardinality support

Innovative learning based structures have recently been proposed to tack...
research
10/01/2018

Chasing Similarity: Distribution-aware Aggregation Scheduling (Extended Version)

Parallel aggregation is a ubiquitous operation in data analytics that is...

Please sign up or login with your details

Forgot password? Click here to reset