Human-Centric Data Cleaning [Vision]

12/24/2017
by   El Kindi Rezig, et al.
0

Data Cleaning refers to the process of detecting and fixing errors in the data. Human involvement is instrumental at several stages of this process, e.g., to identify and repair errors, to validate computed repairs, etc. There is currently a plethora of data cleaning algorithms addressing a wide range of data errors (e.g., detecting duplicates, violations of integrity constraints, missing values, etc.). Many of these algorithms involve a human in the loop, however, this latter is usually coupled to the underlying cleaning algorithms. There is currently no end-to-end data cleaning framework that systematically involves humans in the cleaning pipeline regardless of the underlying cleaning algorithms. In this paper, we highlight key challenges that need to be addressed to realize such a framework. We present a design vision and discuss scenarios that motivate the need for such a framework to judiciously assist humans in the cleaning process. Finally, we present directions to implement such a framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2023

UniHCP: A Unified Model for Human-Centric Perceptions

Human-centric perceptions (e.g., pose estimation, human parsing, pedestr...
research
05/27/2022

The Internet of People: A human and data-centric paradigm for the Next Generation Internet

The cyber-physical convergence, the fast expansion of the Internet at it...
research
10/05/2022

Diverse End User Requirements

As part of our larger research effort to improve support for diverse end...
research
09/07/2022

Auto-TransRL: Autonomous Composition of Vision Pipelines for Robotic Perception

Creating a vision pipeline for different datasets to solve a computer vi...
research
03/24/2022

A Rationale-Centric Framework for Human-in-the-loop Machine Learning

We present a novel rationale-centric framework with human-in-the-loop – ...
research
03/30/2023

A declarative approach to data narration

This vision paper lays the preliminary foundations for Data Narrative Ma...
research
12/03/2017

Formalizing Interruptible Algorithms for Human over-the-loop Analytics

Traditional data mining algorithms are exceptional at seeing patterns in...

Please sign up or login with your details

Forgot password? Click here to reset