Calibration: A Simple Trick for Wide-table Delta Analytics

10/07/2022
by   Zezhou Huang, et al.
0

Data analytics over normalized databases typically requires computing and materializing expensive joins (wide-tables). Factorized query execution models execution as message passing between relations in the join graph and pushes aggregations through joins to reduce intermediate result sizes. Although this accelerates query execution, it only optimizes a single wide-table query. In contrast, wide-table analytics is usually interactive and users want to apply delta to the initial query structure. For instance, users want to slice, dice and drill-down dimensions, update part of the tables and join with new tables for enrichment. Such Wide-table Delta Analytics offers novel work-sharing opportunities. This work shows that carefully materializing messages during query execution can accelerate Wide-table Delta Analytics by >10^5x as compared to factorized execution, and only incurs a constant factor overhead. The key challenge is that messages are sensitive to the message passing ordering. To address this challenge, we borrow the concept of calibration in probabilistic graphical models to materialize sufficient messages to support any ordering. We manifest these ideas in the novel Calibrated Junction Hypertree (CJT) data structure, which is fast to build, aggressively re-uses messages to accelerate future queries, and is incrementally maintainable under updates. We further show how CJTs benefit applications such as OLAP, query explanation, streaming data, and data augmentation for ML. Our experiments evaluate three versions of the CJT that run in a single-threaded custom engine, on cloud DBs, and in Pandas, and show 30x - 10^5x improvements over state-of-the-art factorized execution algorithms on the above applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2022

Data augmentation on graphs for table type classification

Tables are widely used in documents because of their compact and structu...
research
04/07/2021

Correlation Sketches for Approximate Join-Correlation Queries

The increasing availability of structured datasets, from Web tables and ...
research
06/21/2022

Model Joins: Enabling Analytics Over Joins of Absent Big Tables

This work is motivated by two key facts. First, it is highly desirable t...
research
06/28/2021

Untidy Data: The Unreasonable Effectiveness of Tables

Working with data in table form is usually considered a preparatory and ...
research
03/05/2021

sparta: Sparse Tables and their Algebra with a View Towards High Dimensional Graphical Models

A graphical model is a multivariate (potentially very high dimensional) ...
research
10/22/2018

Selection of BJI configuration: Approach based on minimal transversals

Decision systems deal with a large volume of data stored in new database...
research
08/28/2023

Graph Analytics on Evolving Data (Abstract)

We consider the problem of graph analytics on evolving graphs. In this s...

Please sign up or login with your details

Forgot password? Click here to reset