Detecting Quality Problems in Research Data: A Model-Driven Approach

07/22/2020
by   Arno Kesper, et al.
0

As scientific progress highly depends on the quality of research data, there are strict requirements for data quality coming from the scientific community. A major challenge in data quality assurance is to localise quality problems that are inherent to data. Due to the dynamic digitalisation in specific scientific fields, especially the humanities, different database technologies and data formats may be used in rather short terms to gain experiences. We present a model-driven approach to analyse the quality of research data. It allows abstracting from the underlying database technology. Based on the observation that many quality problems show anti-patterns, a data engineer formulates analysis patterns that are generic concerning the database format and technology. A domain expert chooses a pattern that has been adapted to a specific database technology and concretises it for a domain-specific database format. The resulting concrete patterns are used by data analysts to locate quality problems in their databases. As proof of concept, we implemented tool support that realises this approach for XML databases. We evaluated our approach concerning expressiveness and performance in the domain of cultural heritage based on a qualitative study on quality problems occurring in cultural heritage data.

READ FULL TEXT

page 4

page 17

page 18

page 19

page 20

page 21

page 22

page 23

research
11/12/2021

Detecting Quality Problems in Data Models by Clustering Heterogeneous Data Values

Data is of high quality if it is fit for its intended use. The quality o...
research
03/21/2023

Database Technology Evolution

This paper reviews suggestions for changes to database technology coming...
research
10/05/2021

Model-Adaptive Interface Generation for Data-Driven Discovery

Discovery of new knowledge is increasingly data-driven, predicated on a ...
research
04/21/2020

SQLCheck: Automated Detection and Diagnosis of SQL Anti-Patterns

The emergence of database-as-a-service platforms has made deploying data...
research
07/11/2021

Pattern Discovery and Validation Using Scientific Research Methods

Pattern discovery, the process of discovering previously unrecognized pa...
research
03/10/2016

Data fluidity in DARIAH -- pushing the agenda forward

This paper provides both an update concerning the setting up of the Euro...
research
11/27/2018

AstroServ: Distributed Database for Serving Large-Scale Full Life-Cycle Astronomical Data

In time-domain astronomy, STLF (Short-Timescale and Large Field-of-view)...

Please sign up or login with your details

Forgot password? Click here to reset