Data Validation Infrastructure for R

12/20/2019
by   Mark P. J. van der Loo, et al.
0

Checking data quality against domain knowledge is a common activity that pervades statistical analysis from raw data to output. The R package 'validate' facilitates this task by capturing and applying expert knowledge in the form of validation rules: logical restrictions on variables, records, or data sets that should be satisfied before they are considered valid input for further analysis. In the validate package, validation rules are objects of computation that can be manipulated, investigated, and confronted with data or versions of a data set. The results of a confrontation are then available for further investigation, summarization or visualization. Validation rules can also be endowed with metadata and documentation and they may be stored or retrieved from external sources such as text files or tabular formats. This data validation infrastructure thus allows for systematic, user-defined definition of data quality requirements that can be reused for various versions of a data set or by data correction algorithms that are parameterized by validation rules.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2022

Use of Simulation Models for the Development of a Statistical Production Framework for Mobile Network Data with the simutils Package

We propose to use agent-based simulation models for the development of s...
research
12/21/2020

Data Validation

Data validation is the activity where one decides whether or not a parti...
research
09/20/2022

Comparing Shape-Constrained Regression Algorithms for Data Validation

Industrial and scientific applications handle large volumes of data that...
research
07/26/2021

Systematic Literature Review of Validation Methods for AI Systems

Context: Artificial intelligence (AI) has made its way into everyday act...
research
02/05/2021

ROBustness In Network (robin): an R package for Comparison and Validation of communities

In network analysis, many community detection algorithms have been devel...
research
08/02/2021

ricu: R's Interface to Intensive Care Data

Providing computational infrastructure for handling diverse intensive ca...
research
02/26/2019

Detecting Data Errors with Statistical Constraints

A powerful approach to detecting erroneous data is to check which potent...

Please sign up or login with your details

Forgot password? Click here to reset