Joins on Samples: A Theoretical Guide for Practitioners

12/07/2019
by   Dawei Huang, et al.
0

Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join and that it leads to quadratically fewer output tuples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2022

Model Joins: Enabling Analytics Over Joins of Absent Big Tables

This work is motivated by two key facts. First, it is highly desirable t...
research
01/07/2022

Weighted Random Sampling over Joins

Joining records with all other records that meet a linkage condition can...
research
05/15/2018

Approximate Distributed Joins in Apache Spark

The join operation is a fundamental building block of parallel data proc...
research
12/12/2012

Iterative Join-Graph Propagation

The paper presents an iterative version of join-tree clustering that app...
research
03/02/2023

Sampling over Union of Joins

Data scientists often draw on multiple relational data sources for analy...
research
01/29/2018

Join Query Optimization Techniques for Complex Event Processing Applications

Complex event processing (CEP) is a prominent technology used in many mo...
research
05/17/2018

The Two-Sample Problem Via Relative Belief Ratio

This paper deals with a new Bayesian approach to the two-sample problem....

Please sign up or login with your details

Forgot password? Click here to reset