Lightweight Materialization for Fast Dashboards Over Joins

08/24/2023
by   Zezhou Huang, et al.
0

Dashboards are vital in modern business intelligence tools, providing non-technical users with an interface to access comprehensive business data. With the rise of cloud technology, there is an increased number of data sources to provide enriched contexts for various analytical tasks, leading to a demand for interactive dashboards over a large number of joins. Nevertheless, joins are among the most expensive operations in DBMSes, making the support of interactive dashboards over joins challenging. In this paper, we present Treant, a dashboard accelerator for queries over large joins. Treant uses factorized query execution to handle aggregation queries over large joins, which alone is still insufficient for interactive speeds. To address this, we exploit the incremental nature of user interactions using Calibrated Junction Hypertree (CJT), a novel data structure that applies lightweight materialization of the intermediates during factorized execution. CJT ensures that the work needed to compute a query is proportional to how different it is from the previous query, rather than the overall complexity. Treant manages CJTs to share work between queries and performs materialization offline or during user "think-times." Implemented as a middleware that rewrites SQL, Treant is portable to any SQL-based DBMS. Our experiments on single node and cloud DBMSes show that Treant improves dashboard interactions by two orders of magnitude, and provides 10x improvement for ML augmentation compared to SOTA factorized ML system.

READ FULL TEXT
research
04/06/2022

Sigma Workbook: A Spreadsheet for Cloud Data Warehouses

Cloud data warehouses (CDWs) bring large-scale data and compute power cl...
research
07/09/2022

Serving Hybrid-Cloud SQL Interactive Queries at Twitter

The demand for data analytics has been consistently increasing in the pa...
research
03/31/2018

A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

Hadoop is emerging as the primary data hub in enterprises, and SQL repre...
research
07/17/2019

In-Depth Benchmarking of Graph Database Systems with the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB)

In this study, we present the first results of a complete implementation...
research
08/21/2019

GeoBlocks: A Query-Driven Storage Layout for Geospatial Data

City authorities need to analyze urban geospatial data to improve transp...
research
10/25/2020

Approximating Aggregated SQL Queries With LSTM Networks

Despite continuous investments in data technologies, the latency of quer...
research
03/03/2022

Query Processing on Tensor Computation Runtimes

The huge demand for computation in artificial intelligence (AI) is drivi...

Please sign up or login with your details

Forgot password? Click here to reset