Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

07/27/2023
by   Peng Li, et al.
0

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild". Our survey of real spreadsheet-tables and web-tables shows that over 30 for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Power-BI/Tableau forums. We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations. We compile an extensive benchmark for this new task, by collecting 244 real test cases from user spreadsheets and online forums. Our evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70 interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.

READ FULL TEXT
research
09/08/2018

Typed Table Transformations

Spreadsheet tables are often labeled, and these labels effectively const...
research
06/25/2021

Auto-Pipeline: Synthesizing Complex Data Pipelines By-Target Using Reinforcement Learning and Search

Recent work has made significant progress in helping users to automate s...
research
06/21/2023

Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph

Business Intelligence (BI) is crucial in modern enterprises and billion-...
research
10/20/2019

Relational Test Tables: A Practical Specification Language for Evolution and Security

A wide range of interesting program properties are intrinsically relatio...
research
11/14/2017

DataVizard: Recommending Visual Presentations for Structured Data

Selecting the appropriate visual presentation of the data such that it p...
research
08/11/2022

HiTailor: Interactive Transformation and Visualization for Hierarchical Tabular Data

Tabular visualization techniques integrate visual representations with t...
research
01/22/2018

Prioritizing Technical Debt in Database Normalization Using Portfolio Theory and Data Quality Metrics

Database normalization is the one of main principles for designing relat...

Please sign up or login with your details

Forgot password? Click here to reset