Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph

06/21/2023
by   Yiming Lin, et al.
0

Business Intelligence (BI) is crucial in modern enterprises and billion-dollar business. Traditionally, technical experts like database administrators would manually prepare BI-models (e.g., in star or snowflake schemas) that join tables in data warehouses, before less-technical business users can run analytics using end-user dashboarding tools. However, the popularity of self-service BI (e.g., Tableau and Power-BI) in recent years creates a strong demand for less technical end-users to build BI-models themselves. We develop an Auto-BI system that can accurately predict BI models given a set of input tables, using a principled graph-based optimization problem we propose called k-Min-Cost-Arborescence (k-MCA), which holistically considers both local join prediction and global schema-graph structures, leveraging a graph-theoretical structure called arborescence. While we prove k-MCA is intractable and inapproximate in general, we develop novel algorithms that can solve k-MCA optimally, which is shown to be efficient in practice with sub-second latency and can scale to the largest BI-models we encounter (with close to 100 tables). Auto-BI is rigorously evaluated on a unique dataset with over 100K real BI models we harvested, as well as on 4 popular TPC benchmarks. It is shown to be both efficient and accurate, achieving over 0.9 F1-score on both real and synthetic benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2023

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

Relational tables, where each row corresponds to an entity and each colu...
research
06/15/2020

Needles in the 'Sheet'stack: Augmented Analytics to get Insights from Spreadsheets

Business intelligence (BI) tools for database analytics have come a long...
research
04/11/2017

Next Generation Business Intelligence and Analytics: A Survey

Business Intelligence and Analytics (BI&A) is the process of extracting ...
research
06/04/2023

Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

Data pipelines are widely employed in modern enterprises to power a vari...
research
03/07/2021

Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples

Fuzzy similarity join is an important database operator widely used in p...
research
06/03/2021

Niffler: A Reference Architecture and System Implementation for View Discovery over Pathless Table Collections by Example

Identifying a project-join view (PJ-view) over collections of tables is ...
research
05/02/2021

BI-REC: Guided Data Analysis for Conversational Business Intelligence

Conversational interfaces to Business Intelligence (BI) applications ena...

Please sign up or login with your details

Forgot password? Click here to reset