DeepDB: Learn from Data, not from Queries!

09/02/2019
by   Benjamin Hilprecht, et al.
0

The typical approach for learned DBMS components is to capture the behavior by running a representative set of queries and use the observations to train a machine learning model. This workload-driven approach, however, has two major downsides. First, collecting the training data can be very expensive, since all queries need to be executed on potentially large databases. Second, training data has to be recollected when the workload and the data changes. To overcome these limitations, we take a different route: we propose to learn a pure data-driven model that can be used for different tasks such as query answering or cardinality estimation. This data-driven model also supports ad-hoc queries and updates of the data without the need of full retraining when the workload or data changes. Indeed, one may now expect that this comes at a price of lower accuracy since workload-driven models can make use of more information. However, this is not the case. The results of our empirical evaluation demonstrate that our data-driven approach not only provides better accuracy than state-of-the-art learned components but also generalizes better to unseen queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2021

A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation

Cardinality estimation is a fundamental problem in database systems. To ...
research
06/01/2023

Knowledge-based Reasoning and Learning under Partial Observability in Ad Hoc Teamwork

Ad hoc teamwork refers to the problem of enabling an agent to collaborat...
research
01/03/2022

Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction

In this paper, we introduce zero-shot cost models which enable learned c...
research
07/12/2019

Detecting coherent explorations in SQL workloads

This paper presents a proposal aiming at better understanding a workload...
research
08/25/2018

Database-Agnostic Workload Management

We present a system to support generalized SQL workload analysis and man...
research
02/06/2022

Learning to be a Statistician: Learned Estimator for Number of Distinct Values

Estimating the number of distinct values (NDV) in a column is useful for...
research
12/12/2020

Are We Ready For Learned Cardinality Estimation?

Cardinality estimation is a fundamental but long unresolved problem in q...

Please sign up or login with your details

Forgot password? Click here to reset