Towards "all-inclusive" Data Preparation to ensure Data Quality

08/28/2023
by   Valerie Restat, et al.
0

Data preparation, especially data cleaning, is very important to ensure data quality and to improve the output of automated decision systems. Since there is no single tool that covers all steps required, a combination of tools – namely a data preparation pipeline – is required. Such process comes with a number of challenges. We outline the challenges and describe the different tasks we want to analyze in our future research to address these. A test data generator which we implemented to constitute the basis for our future work will also be introduced in detail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

A Level-wise Taxonomic Perspective on Automated Machine Learning to Date and Beyond: Challenges and Opportunities

Automated machine learning (AutoML) is essentially automating the proces...
research
07/29/2022

The Effects of Data Quality on Machine Learning Performance

Modern artificial intelligence (AI) applications require large quantitie...
research
02/26/2023

Understanding URDF: A Survey Based on User Experience

With the increasing complexity of robot systems, it is necessary to simu...
research
04/22/2020

Code Smells and Refactoring: A Tertiary Systematic Review of Challenges and Observations

In this paper, we present a tertiary systematic literature review of pre...
research
12/19/2018

Progressive Data Science: Potential and Challenges

Data science requires time-consuming iterative manual activities. In par...
research
07/18/2019

A Survey of Data Quality Measurement and Monitoring Tools

High-quality data is key to interpretable and trustworthy data analytics...
research
08/16/2022

Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch Detection

eBay's experimentation platform runs hundreds of A/B tests on any given ...

Please sign up or login with your details

Forgot password? Click here to reset