Enhanced Inversion of Schema Evolution with Provenance

11/24/2022
by   Tanja Auge, et al.
0

Long-term data-driven studies have become indispensable in many areas of science. Often, the data formats, structures and semantics of data change over time, the data sets evolve. Therefore, studies over several decades in particular have to consider changing database schemas. The evolution of these databases lead at some point to a large number of schemas, which have to be stored and managed, costly and time-consuming. However, in the sense of reproducibility of research data each database version must be reconstructable with little effort. So a previously published result can be validated and reproduced at any time. Nevertheless, in many cases, such an evolution can not be fully reconstructed. This article classifies the 15 most frequently used schema modification operators and defines the associated inverses for each operation. For avoiding an information loss, it furthermore defines which additional provenance information have to be stored. We define four classes dealing with dangling tuples, duplicates and provenance-invariant operators. Each class will be presented by one representative. By using and extending the theory of schema mappings and their inverses for queries, data analysis, why-provenance, and schema evolution, we are able to combine data analysis applications with provenance under evolving database structures, in order to enable the reproducibility of scientific results over longer periods of time. While most of the inverses of schema mappings used for analysis or evolution are not exact, but only quasi-inverses, adding provenance information enables us to reconstruct a sub-database of research data that is sufficient to guarantee reproducibility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2020

Replicability and Reproducibility of a Schema Evolution Study in Embedded Databases

Ascertaining the feasibility of independent falsification or repetition ...
research
10/08/2022

Online Schema Evolution is (Almost) Free for Snapshot Databases

Modern database applications often change their schemas to keep up with ...
research
02/28/2020

An Empirical Study on the Design and Evolution of NoSQL Database Schemas

We study how software engineers design and evolve their domain model whe...
research
12/04/2019

Direct Mappings between RDF and Property Graph Databases

Resource Description Framework (RDF) triplestores and Property Graph (PG...
research
07/06/2023

JSONoid: Monoid-based Enrichment for Configurable and Scalable Data-Driven Schema Discovery

Schema discovery is an important aspect to working with data in formats ...
research
05/26/2021

Conceptual Schema Optimisation – Database Optimisation before sliding down the Waterfall

In this article we discuss an approach to database optimisation in which...
research
02/13/2023

Right-Adjoints for Datalog Programs, and Homomorphism Dualities over Restricted Classes

A Datalog program can be viewed as a syntactic specification of a functo...

Please sign up or login with your details

Forgot password? Click here to reset