Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

03/29/2021
by   Xi Liang, et al.
0

Sample-based approximate query processing (AQP) suffers from many pitfalls such as the inability to answer very selective queries and unreliable confidence intervals when sample sizes are small. Recent research presented an intriguing solution of combining materialized, pre-computed aggregates with sampling for accurate and more reliable AQP. We explore this solution in detail in this work and propose an AQP physical design called PASS, or Precomputation-Assisted Stratified Sampling. PASS builds a tree of partial aggregates that cover different partitions of the dataset. The leaf nodes of this tree form the strata for stratified samples. Aggregate queries whose predicates align with the partitions (or unions of partitions) are exactly answered with a depth-first search, and any partial overlaps are approximated with the stratified samples. We propose an algorithm for optimally partitioning the data into such a data structure with various practical approximation techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2020

LAQP: Learning-based Approximate Query Processing

Querying on big data is a challenging task due to the rapid growth of da...
research
08/24/2020

Approximate Partition Selection for Big-Data Workloads using Summary Statistics

Many big-data clusters store data in large partitions that support acces...
research
03/08/2022

Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling

A knowledge graph (KG) manages large-scale and real-world facts as a big...
research
11/09/2019

EntropyDB: A Probabilistic Approach to Approximate Query Processing

We present EntropyDB, an interactive data exploration system that uses a...
research
09/05/2019

Random Sampling for Group-By Queries

Random sampling has been widely used in approximate query processing on ...
research
07/22/2020

R*-Grove: Balanced Spatial Partitioning for Large-scale Datasets

The rapid growth of big spatial data urged the research community to dev...
research
08/10/2020

Rapid Approximate Aggregation with Distribution-Sensitive Interval Guarantees

Aggregating data is fundamental to data analytics, data exploration, and...

Please sign up or login with your details

Forgot password? Click here to reset