Joins on Samples: A Theoretical Guide for Practitioners

12/07/2019
by   Dawei Huang, et al.
0

Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join and that it leads to quadratically fewer output tuples.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

06/21/2022

Model Joins: Enabling Analytics Over Joins of Absent Big Tables

This work is motivated by two key facts. First, it is highly desirable t...
01/07/2022

Weighted Random Sampling over Joins

Joining records with all other records that meet a linkage condition can...
05/15/2018

Approximate Distributed Joins in Apache Spark

The join operation is a fundamental building block of parallel data proc...
12/12/2012

Iterative Join-Graph Propagation

The paper presents an iterative version of join-tree clustering that app...
01/29/2018

Join Query Optimization Techniques for Complex Event Processing Applications

Complex event processing (CEP) is a prominent technology used in many mo...
05/17/2018

The Two-Sample Problem Via Relative Belief Ratio

This paper deals with a new Bayesian approach to the two-sample problem....
09/25/2020

Do We Really Sample Right In Model-Based Diagnosis?

Statistical samples, in order to be representative, have to be drawn fro...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.