Model-based Approximate Query Processing

11/15/2018
by   Moritz Kulessa, et al.
0

Interactive visualizations are arguably the most important tool to explore, understand and convey facts about data. In the past years, the database community has been working on different techniques for Approximate Query Processing (AQP) that aim to deliver an approximate query result given a fixed time bound to support interactive visualizations better. However, classical AQP approaches suffer from various problems that limit the applicability to support the ad-hoc exploration of a new data set: (1) Classical AQP approaches that perform online sampling can support ad-hoc exploration queries but yield low quality if executed over rare subpopulations. (2) Classical AQP approaches that rely on offline sampling can use some form of biased sampling to mitigate these problems but require a priori knowledge of the workload, which is often not realistic if users want to explore a new database. In this paper, we present a new approach to AQP called Model-based AQP that leverages generative models learned over the complete database to answer SQL queries at interactive speeds. Different from classical AQP approaches, generative models allow us to compute responses to ad-hoc queries and deliver high-quality estimates also over rare subpopulations at the same time. In our experiments with real and synthetic data sets, we show that Model-based AQP can in many scenarios return more accurate results in a shorter runtime. Furthermore, we think that our techniques of using generative models presented in this paper can not only be used for AQP in databases but also has applications for other database problems including Query Optimization as well as Data Cleaning.

READ FULL TEXT
research
04/14/2022

Online Aggregation based Approximate Query Processing: A Literature Survey

In the current world, OLAP (Online Analytical Processing) is used intens...
research
04/07/2018

IDEBench: A Benchmark for Interactive Data Exploration

Existing benchmarks for analytical database systems such as TPC-DS and T...
research
08/29/2020

STULL: Unbiased Online Sampling for Visual Exploration of Large Spatiotemporal Data

Online sampling-supported visual analytics is increasingly important, as...
research
08/16/2020

DeepSampling: Selectivity Estimation with Predicted Error and Response Time

The rapid growth of spatial data urges the research community to find ef...
research
12/01/2020

Sigma Worksheet: Interactive Construction of OLAP Queries

A new generation of cloud data warehouses has changed the landscape of i...
research
01/28/2022

Electra: Conditional Generative Model based Predicate-Aware Query Approximation

The goal of Approximate Query Processing (AQP) is to provide very fast b...
research
04/06/2022

Sigma Workbook: A Spreadsheet for Cloud Data Warehouses

Cloud data warehouses (CDWs) bring large-scale data and compute power cl...

Please sign up or login with your details

Forgot password? Click here to reset