Join Size Bounds using Lp-Norms on Degree Sequences

06/24/2023
by   Mahmoud Abo Khamis, et al.
0

Estimating the output size of a join query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true join size by orders of magnitude, which leads to significant system performance penalty. Recently, size upper bounds have been proposed that are based on information inequalities and incorporate sizes and max-degrees from input relations, yet they grossly overestimate the true join size. This paper puts forward a general class of size bounds that are based on information inequalities involving Lp-norms on the degree sequences of the join columns. They generalise prior efforts and can be asymptotically tighter than the known bounds. We give two types of lower and upper bounds: some hold for all entropic vectors, while others hold for all polymatroids. Whereas the former are asymptotically tight but possibly not computable, the latter are computable but not even asymptotically tight. In the case when all degree constraints are over a single variable then we call them "simple", and prove that the polymatroid and entropic bounds are equal, they are tight up to a query-dependent constant (which is stronger than asymptotically tight), are computable in exponential time in the size of the query, and that the worst case database instance that matches the bound has a simple structure called a "normal database".

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/24/2023

Applications of Information Inequalities to Database Theory Problems

The paper describes several applications of information inequalities to ...
research
01/11/2022

Degree Sequence Bound For Join Cardinality Estimation

Recent work has demonstrated the catastrophic effects of poor cardinalit...
research
03/21/2020

Covering the Relational Join

In this paper, we initiate a theoretical study of what we call the join ...
research
03/27/2018

Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems

Worst-case optimal join algorithms are the class of join algorithms whos...
research
11/08/2022

Consistent Query Answering for Primary Keys and Conjunctive Queries with Counting

The problem of consistent query answering for primary keys and self-join...
research
03/14/2022

On Semialgebraic Range Reporting

In the problem of semialgebraic range searching, we are to preprocess a ...
research
04/16/2018

Adaptive MapReduce Similarity Joins

Similarity joins are a fundamental database operation. Given data sets S...

Please sign up or login with your details

Forgot password? Click here to reset