An Integration-Oriented Ontology to Govern Evolution in Big Data Ecosystems

01/16/2018
by   Sergi Nadal, et al.
0

Big Data architectures allow to flexibly store and process heterogeneous data, from multiple sources, in their original format. The structure of those data, commonly supplied by means of REST APIs, is continuously evolving. Thus data analysts need to adapt their analytical processes after each API release. This gets more challenging when performing an integrated or historical analysis. To cope with such complexity, in this paper, we present the Big Data Integration ontology, the core construct to govern the data integration process under schema evolution by systematically annotating it with information regarding the schema of the sources. We present a query rewriting algorithm that, using the annotated ontology, converts queries posed over the ontology to queries over the sources. To cope with syntactic evolution in the sources, we present an algorithm that semi-automatically adapts the ontology upon new releases. This guarantees ontology-mediated queries to correctly retrieve data from the most recent schema version as well as correctness in historical queries. A functional and performance evaluation on real-world APIs is performed to validate our approach.

READ FULL TEXT

page 15

page 17

page 27

research
12/26/2010

Ontology-based Queries over Cancer Data

The ever-increasing amount of data in biomedical research, and in cancer...
research
03/11/2020

Crop Knowledge Discovery Based on Agricultural Big Data Integration

Nowadays, the agricultural data can be generated through various sources...
research
06/13/2023

Temporalising Unique Characterisability and Learnability of Ontology-Mediated Queries

Recently, the study of the unique characterisability and learnability of...
research
06/03/2019

Reasoning about disclosure in data integration in the presence of source constraints

Data integration systems allow users to access data sitting in multiple ...
research
08/30/2021

MultiCategory: Multi-model Query Processing Meets Category Theory and Functional Programming

The variety of data is one of the important issues in the era of Big Dat...
research
12/11/2018

The Empusa code generator: bridging the gap between the intended and the actual content of RDF resources

The RDF data model facilitates integration of diverse data available in ...
research
11/04/2019

Incremental extraction of a NoSQL database model using an MDA-based process

In recent years, the need to use NoSQL systems to store and exploit big ...

Please sign up or login with your details

Forgot password? Click here to reset