WebRelate: Integrating Web Data with Spreadsheets using Examples

11/15/2017
by   Jeevana Priya Inala, et al.
0

Data integration between web sources and relational data is a key challenge faced by data scientists and spreadsheet users. There are two main challenges in programmatically joining web data with relational data. First, most websites do not expose a direct interface to obtain tabular data, so the user needs to formulate a logic to get to different webpages for each input row in the relational table. Second, after reaching the desired webpage, the user needs to write complex scripts to extract the relevant data, which is often conditioned on the input data. Since many data scientists and end-users come from diverse backgrounds, writing such complex regular-expression based logical scripts to perform data integration tasks is unfortunately often beyond their programming expertise. We present WebRelate, a system that allows users to join semi-structured web data with relational data in spreadsheets using input-output examples. WebRelate decomposes the web data integration task into two sub-tasks of i) URL learning and ii) input-dependent web extraction. The first sub-task generates the URLs for the webpages containing the desired data for all rows in the relational table. WebRelate achieves this by learning a string transformation program using a few example URLs. The second sub-task uses examples of desired data to be extracted from the corresponding webpages and learns a program to extract the data for the other rows. We design expressive domain-specific languages for URL generation and web data extraction, and present efficient synthesis algorithms for learning programs in these DSLs from few input-output examples. We evaluate WebRelate on 88 real-world web data integration tasks taken from online help forums and Excel product team, and show that WebRelate can learn the desired programs within few seconds using only 1 example for the majority of the tasks.

READ FULL TEXT
research
07/05/2017

Synthesis of Data Completion Scripts using Finite Tree Automata

In application domains that store data in a tabular format, a common tas...
research
06/19/2020

Neural Program Synthesis with a Differentiable Fixer

We present a new program synthesis approach that combines an encoder-dec...
research
07/16/2023

Programming by Example Made Easy

Programming by example (PBE) is an emerging programming paradigm that au...
research
07/08/2023

Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems

In mapping enterprise applications, data mapping remains a fundamental p...
research
08/10/2023

DiLogics: Creating Web Automation Programs With Diverse Logics

Knowledge workers frequently encounter repetitive web data entry tasks, ...
research
06/06/2019

One-shot Information Extraction from Document Images using Neuro-Deductive Program Synthesis

Our interest in this paper is in meeting a rapidly growing industrial de...
research
11/10/2017

Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example

While many applications export data in hierarchical formats like XML and...

Please sign up or login with your details

Forgot password? Click here to reset