Data Context Informed Data Wrangling

11/22/2018
by   Martin Koehler, et al.
0

The process of preparing potentially large and complex data sets for further analysis or manual examination is often called data wrangling. In classical warehousing environments, the steps in such a process have been carried out using Extract-Transform-Load platforms, with significant manual involvement in specifying, configuring or tuning many of them. Cost-effective data wrangling processes need to ensure that data wrangling steps benefit from automation wherever possible. In this paper, we define a methodology to fully automate an end-to-end data wrangling process incorporating data context, which associates portions of a target schema with potentially spurious extensional data of types that are commonly available. Instance-based evidence together with data profiling paves the way to inform automation in several steps within the wrangling process, specifically, matching, mapping validation, value format transformation, and data repair. The approach is evaluated with real estate data showing substantial improvements in the results of automated wrangling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2022

Towards Schema Inference for Data Lakes

A data lake is a repository of data with potential for future analysis. ...
research
09/29/2020

Designing everyday automation with well-being in mind

Nowadays, automation not only permeates industry but also becomes a subs...
research
05/17/2023

Personalized action suggestions in low-code automation platforms

Automation platforms aim to automate repetitive tasks using workflows, w...
research
04/02/2022

A Review of Data-driven Robotic Process Automation Exploiting Process Mining

Purpose: Process mining aims to construct, from event logs, process maps...
research
09/01/2023

Powder-Bot: A Modular Autonomous Multi-Robot Workflow for Powder X-Ray Diffraction

Powder X-ray diffraction (PXRD) is a key technique for the structural ch...
research
02/27/2020

Data-Driven Metadata Tagging for Building Automation Systems: A Unified Architecture

This article presents a Unified Architecture for automated point tagging...

Please sign up or login with your details

Forgot password? Click here to reset