Electra: Conditional Generative Model based Predicate-Aware Query Approximation

01/28/2022
by   Nikhil Sheoran, et al.
2

The goal of Approximate Query Processing (AQP) is to provide very fast but "accurate enough" results for costly aggregate queries thereby improving user experience in interactive exploration of large datasets. Recently proposed Machine-Learning based AQP techniques can provide very low latency as query execution only involves model inference as compared to traditional query processing on database clusters. However, with increase in the number of filtering predicates(WHERE clauses), the approximation error significantly increases for these methods. Analysts often use queries with a large number of predicates for insights discovery. Thus, maintaining low approximation error is important to prevent analysts from drawing misleading conclusions. In this paper, we propose ELECTRA, a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors. ELECTRA uses a conditional generative model that learns the conditional distribution of the data and at runtime generates a small ( 1000 rows) but representative sample, on which the query is executed to compute the approximate result. Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.

READ FULL TEXT

page 5

page 12

research
01/08/2021

Approximate Query Processing for Group-By Queries based on Conditional Generative Models

The Group-By query is an important kind of query, which is common and wi...
research
03/05/2020

LAQP: Learning-based Approximate Query Processing

Querying on big data is a challenging task due to the rapid growth of da...
research
12/12/2022

Reinforced Approximate Exploratory Data Analysis

Exploratory data analytics (EDA) is a sequential decision making process...
research
10/25/2020

Approximating Aggregated SQL Queries With LSTM Networks

Despite continuous investments in data technologies, the latency of quer...
research
11/15/2018

Model-based Approximate Query Processing

Interactive visualizations are arguably the most important tool to explo...
research
08/16/2020

DeepSampling: Selectivity Estimation with Predicted Error and Response Time

The rapid growth of spatial data urges the research community to find ef...
research
10/16/2019

Similarity Driven Approximation for Text Analytics

Text analytics has become an important part of business intelligence as ...

Please sign up or login with your details

Forgot password? Click here to reset