Fine-Tuning Data Structures for Analytical Query Processing

12/24/2021
by   Amir Shaikhha, et al.
0

We introduce a framework for automatically choosing data structures to support efficient computation of analytical workloads. Our contributions are twofold. First, we introduce a novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines. This language is designed around the notion of dictionaries, and allows for a more fine-grained choice of its low-level implementation. Second, the cost model for alternative implementations is automatically inferred by combining machine learning and program reasoning. The dictionary cost model is learned using a regression model trained over the profiling dataset of dictionary operations on a given hardware architecture. The program cost model is inferred using static program analysis. Our experimental results show the effectiveness of the trained cost model on micro benchmarks. Furthermore, we show that the performance of the code generated by our framework either outperforms or is on par with the state-of-the-art analytical query engines and a recent in-database machine learning framework.

READ FULL TEXT
research
04/07/2018

IDEBench: A Benchmark for Interactive Data Exploration

Existing benchmarks for analytical database systems such as TPC-DS and T...
research
06/14/2016

Why is Compiling Lifted Inference into a Low-Level Language so Effective?

First-order knowledge compilation techniques have proven efficient for l...
research
06/07/2022

Automated Expected Amortised Cost Analysis of Probabilistic Data Structures

In this paper, we present the first fully-automated expected amortised c...
research
10/10/2017

Proofs as Relational Invariants of Synthesized Execution Grammars

The automatic verification of programs that maintain unbounded low-level...
research
01/10/2020

Multi-layer Optimizations for End-to-End Data Analytics

We consider the problem of training machine learning models over multi-r...
research
04/02/2021

An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines

Finding a good query plan is key to the optimization of query runtime. T...
research
01/15/2018

Cobra: A Framework for Cost Based Rewriting of Database Applications

Database applications are typically written using a mixture of imperativ...

Please sign up or login with your details

Forgot password? Click here to reset