Guaranteeing the Õ(AGM/OUT) Runtime for Uniform Sampling and OUT Size Estimation over Joins

04/03/2023
by   Kyoungmin Kim, et al.
0

We propose a new method for estimating the number of answers OUT of a small join query Q in a large database D, and for uniform sampling over joins. Our method is the first to satisfy all the following statements. - Support arbitrary Q, which can be either acyclic or cyclic, and contain binary and non-binary relations. - Guarantee an arbitrary small error with a high probability always in Õ(AGM/OUT) time, where AGM is the AGM bound OUT (an upper bound of OUT), and Õ hides the polylogarithmic factor of input size. We also explain previous join size estimators in a unified framework. All methods including ours rely on certain indexes on relations in D, which take linear time to build offline. Additionally, we extend our method using generalized hypertree decompositions (GHDs) to achieve a lower complexity than Õ(AGM/OUT) when OUT is small, and present optimization techniques for improving estimation efficiency and accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2022

Safe Subjoins in Acyclic Joins

It is expensive to compute joins, often due to large intermediate relati...
research
01/25/2023

Free Join: Unifying Worst-Case Optimal and Traditional Joins

Over the last decade, worst-case optimal join (WCOJ) algorithms have eme...
research
12/15/2020

Instance Optimal Join Size Estimation

We consider the problem of efficiently estimating the size of the inner ...
research
01/28/2021

Beyond Equi-joins: Ranking, Enumeration and Factorization

We study full acyclic join queries with general join predicates that inv...
research
05/25/2023

Efficient Computation of Quantiles over Joins

We present efficient algorithms for Quantile Join Queries, abbreviated a...
research
03/02/2023

Sampling over Union of Joins

Data scientists often draw on multiple relational data sources for analy...
research
04/01/2022

Givens QR Decomposition over Relational Databases

This paper introduces Figaro, an algorithm for computing the upper-trian...

Please sign up or login with your details

Forgot password? Click here to reset