Accurate Summary-based Cardinality Estimation Through the Lens of Cardinality Estimation Graphs

05/19/2021
by   Jeremy Chen, et al.
0

We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional independence assumptions; and (ii) the recent pessimistic estimators that use information theoretic linear programs. We begin by addressing the problem of how to make accurate estimates for optimistic estimators. We model these estimators as picking bottom-to-top paths in a cardinality estimation graph (CEG), which contains sub-queries as nodes and weighted edges between sub-queries that represent average degrees. We outline a space of heuristics to make an optimistic estimate in this framework and show that effective heuristics depend on the structure of the input queries. We observe that on acyclic queries and queries with small-size cycles, using the maximum-weight path is an effective technique to address the well known underestimation problem for optimistic estimators. We show that on a large suite of datasets and workloads, the accuracy of such estimates is up to three orders of magnitude more accurate in mean q-error than some prior heuristics that have been proposed in prior work. In contrast, we show that on queries with larger cycles these estimators tend to overestimate, which can partially be addressed by using minimum weight paths and more effectively by using an alternative CEG. We then show that CEGs can also model the recent pessimistic estimators. This surprising result allows us to connect two disparate lines of work on optimistic and pessimistic estimators, adopt an optimization from pessimistic estimators to optimistic ones, and provide insights into the pessimistic estimators, such as showing that there are alternative combinatorial solutions to the linear programs that define them.

READ FULL TEXT
research
01/29/2018

Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation

Estimating the cardinality (i.e., the number of answers) of conjunctive ...
research
11/17/2022

SafeBound: A Practical System for Generating Cardinality Bounds

Recent work has reemphasized the importance of cardinality estimates for...
research
12/11/2022

FactorJoin: A New Cardinality Estimation Framework for Join Queries

Cardinality estimation is one of the most fundamental and challenging pr...
research
06/15/2020

NeuroCard: One Cardinality Estimator for All Tables

Query optimizers rely on accurate cardinality estimates to produce good ...
research
02/21/2021

LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs

Accurate cardinality estimates are a key ingredient to achieve optimal q...
research
04/17/2019

Estimating Cardinalities with Deep Sketches

We introduce Deep Sketches, which are compact models of databases that a...
research
08/22/2022

Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

Cardinality Estimation (aka Distinct Elements) is a classic problem in s...

Please sign up or login with your details

Forgot password? Click here to reset