Revealing the Semantics of Data Wrangling Scripts With COMANTICS

09/28/2022
by   Kai Xiong, et al.
0

Data workers usually seek to understand the semantics of data wrangling scripts in various scenarios, such as code debugging, reusing, and maintaining. However, the understanding is challenging for novice data workers due to the variety of programming languages, functions, and parameters. Based on the observation that differences between input and output tables highly relate to the type of data transformation, we outline a design space including 103 characteristics to describe table differences. Then, we develop COMANTICS, a three-step pipeline that automatically detects the semantics of data transformation scripts. The first step focuses on the detection of table differences for each line of wrangling code. Second, we incorporate a characteristic-based component and a Siamese convolutional neural network-based component for the detection of transformation types. Third, we derive the parameters of each data transformation by employing a "slot filling" strategy. We design experiments to evaluate the performance of COMANTICS. Further, we assess its flexibility using three example applications in different domains.

READ FULL TEXT

page 1

page 4

page 6

page 8

page 9

research
09/28/2022

Visualizing the Scripts of Data Wrangling with SOMNUS

Data workers use various scripting languages for data transformation, su...
research
05/30/2019

Learning Semantic Annotations for Tabular Data

The usefulness of tabular data such as web tables critically depends on ...
research
03/08/2023

SEMv2: Table Separation Line Detection Based on Conditional Convolution

Table structure recognition is an indispensable element for enabling mac...
research
10/06/2019

Design and Use of Loop-Transformation Pragmas

Adding a pragma directive into the source code is arguably easier than r...
research
06/25/2021

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

Spreadsheet table detection is the task of detecting all tables on a giv...
research
07/16/2023

Programming by Example Made Easy

Programming by example (PBE) is an emerging programming paradigm that au...

Please sign up or login with your details

Forgot password? Click here to reset