Aggregation Consistency Errors in Semantic Layers and How to Avoid Them

07/01/2023
by   Zezhou Huang, et al.
0

Analysts often struggle with analyzing data from multiple tables in a database due to their lack of knowledge on how to join and aggregate the data. To address this, data engineers pre-specify "semantic layers" which include the join conditions and "metrics" of interest with aggregation functions and expressions. However, joins can cause "aggregation consistency issues". For example, analysts may observe inflated total revenue caused by double counting from join fanouts. Existing BI tools rely on heuristics for deduplication, resulting in imprecise and challenging-to-understand outcomes. To overcome these challenges, we propose "weighing" as a core primitive to counteract join fanouts. "Weighing" has been used in various areas, such as market attribution and order management, ensuring metrics consistency (e.g., total revenue remains the same) even for many-to-many joins. The idea is to assign equal weight to each join key group (rather than each tuple) and then distribute the weights among tuples. Implementing weighing techniques necessitates user input; therefore, we recommend a human-in-the-loop framework that enables users to iteratively explore different strategies and visualize the results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2020

Discovering Multi-Table Functional Dependencies Without Full Join Computation

In this paper, we study the problem of discovering join FDs, i.e., funct...
research
06/13/2019

Memory-Efficient Group-by Aggregates over Multi-Way Joins

Aggregate computation in relational databases has long been done using t...
research
06/21/2022

Model Joins: Enabling Analytics Over Joins of Absent Big Tables

This work is motivated by two key facts. First, it is highly desirable t...
research
07/28/2023

Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries

This paper presents predicate transfer, a novel method that optimizes jo...
research
11/27/2021

Controlling the Correctness of Aggregation Operations During Sessions of Interactive Analytic Queries

We present a comprehensive set of conditions and rules to control the co...
research
08/20/2020

DPMC: Weighted Model Counting by Dynamic Programming on Project-Join Trees

We propose a unifying dynamic-programming framework to compute exact lit...
research
11/27/2018

Efficiently Charting RDF

We propose a visual query language for interactively exploring large-sca...

Please sign up or login with your details

Forgot password? Click here to reset