GAN-based Tabular Data Generator for Constructing Synopsis in Approximate Query Processing: Challenges and Solutions

12/18/2022
by   Mohammadali Fallahian, et al.
0

In data-driven systems, data exploration is imperative for making real-time decisions. However, big data is stored in massive databases that are difficult to retrieve. Approximate Query Processing (AQP) is a technique for providing approximate answers to aggregate queries based on a summary of the data (synopsis) that closely replicates the behavior of the actual data, which can be useful where an approximate answer to the queries would be acceptable in a fraction of the real execution time. In this paper, we discuss the use of Generative Adversarial Networks (GANs) for generating tabular data that can be employed in AQP for synopsis construction. We first discuss the challenges associated with constructing synopses in relational databases and then introduce solutions to those challenges. Following that, we organized statistical metrics to evaluate the quality of the generated synopses. We conclude that tabular data complexity makes it difficult for algorithms to understand relational database semantics during training, and improved versions of tabular GANs are capable of constructing synopses to revolutionize data-driven decision-making systems.

READ FULL TEXT

page 5

page 8

page 13

page 15

research
08/13/2019

Adaptive Learning of Aggregate Analytics under Dynamic Workloads

Large organizations have seamlessly incorporated data-driven decision ma...
research
11/09/2019

EntropyDB: A Probabilistic Approach to Approximate Query Processing

We present EntropyDB, an interactive data exploration system that uses a...
research
02/01/2019

Incremental Techniques for Large-Scale Dynamic Query Processing

Many applications from various disciplines are now required to analyze f...
research
04/16/2022

An Overview of Query Processing on Crowdsourced Databases

Crowd-sourcing is a powerful solution for finding correct answers to exp...
research
03/14/2020

ML-AQP: Query-Driven Approximate Query Processing based on Machine Learning

As more and more organizations rely on data-driven decision making, larg...
research
09/05/2019

Random Sampling for Group-By Queries

Random sampling has been widely used in approximate query processing on ...
research
08/24/2020

Approximate Partition Selection for Big-Data Workloads using Summary Statistics

Many big-data clusters store data in large partitions that support acces...

Please sign up or login with your details

Forgot password? Click here to reset