Approximate Query Processing for Group-By Queries based on Conditional Generative Models

01/08/2021
by   Meifan Zhang, et al.
0

The Group-By query is an important kind of query, which is common and widely used in data warehouses, data analytics, and data visualization. Approximate query processing is an effective way to increase the querying efficiency on big data. The answer to a group-by query involves multiple values, which makes it difficult to provide sufficiently accurate estimations for all the groups. Stratified sampling improves the accuracy compared with the uniform sampling, but the samples chosen for some special queries cannot work for other queries. Online sampling chooses samples for the given query at query time, but it requires a long latency. Thus, it is a challenge to achieve both accuracy and efficiency at the same time. Facing such challenge, in this work, we propose a sample generation framework based on a conditional generative model. The sample generation framework can generate any number of samples for the given query without accessing the data. The proposed framework based on the lightweight model can be combined with stratified sampling and online aggregation to improve the estimation accuracy for group-by queries. The experimental results show that our proposed methods are both efficient and accurate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2020

LAQP: Learning-based Approximate Query Processing

Querying on big data is a challenging task due to the rapid growth of da...
research
09/05/2019

Random Sampling for Group-By Queries

Random sampling has been widely used in approximate query processing on ...
research
01/28/2022

Electra: Conditional Generative Model based Predicate-Aware Query Approximation

The goal of Approximate Query Processing (AQP) is to provide very fast b...
research
07/29/2018

MISS: Finding Optimal Sample Sizes for Approximate Analytics

Nowadays, sampling-based Approximate Query Processing (AQP) is widely re...
research
07/30/2018

To Ship or Not to (Function) Ship (Extended version)

Sampling is often used to reduce query latency for interactive big data ...
research
11/09/2019

EntropyDB: A Probabilistic Approach to Approximate Query Processing

We present EntropyDB, an interactive data exploration system that uses a...
research
11/06/2017

An Iterative Scheme for Leverage-based Approximate Aggregation

Currently data explosion poses great challenges to approximate aggregati...

Please sign up or login with your details

Forgot password? Click here to reset