A Polystore Architecture Using Knowledge Graphs to Support Queries on Heterogeneous Data Stores

Modern applications commonly need to manage dataset types composed of heterogeneous data and schemas, making it difficult to access them in an integrated way. A single data store to manage heterogeneous data using a common data model is not effective in such a scenario, which results in the domain data being fragmented in the data stores that best fit their storage and access requirements (e.g., NoSQL, relational DBMS, or HDFS). Besides, organization workflows independently consume these fragments, and usually, there is no explicit link among the fragments that would be useful to support an integrated view. The research challenge tackled by this work is to provide the means to query heterogeneous data residing on distinct data repositories that are not explicitly connected. We propose a federated database architecture by providing a single abstract global conceptual schema to users, allowing them to write their queries, encapsulating data heterogeneity, location, and linkage by employing: (i) meta-models to represent the global conceptual schema, the remote data local conceptual schemas, and mappings among them; (ii) provenance to create explicit links among the consumed and generated data residing in separate datasets. We evaluated the architecture through its implementation as a polystore service, following a microservice architecture approach, in a scenario that simulates a real case in Oil & Gas industry. Also, we compared the proposed architecture to a relational multidatabase system based on foreign data wrappers, measuring the user's cognitive load to write a query (or query complexity) and the query processing time. The results demonstrated that the proposed architecture allows query writing two times less complex than the one written for the relational multidatabase system, adding an excess of no more than 30

READ FULL TEXT

page 2

page 13

research
07/21/2017

Cost-Driven Ontology-Based Data Access (Extended Version)

In ontology-based data access (OBDA), users are provided with a conceptu...
research
09/01/2021

MORTAL: A Tool of Automatically Designing Relational Storage Schemas for Multi-model Data through Reinforcement Learning

Considering relational databases having powerful capabilities in handlin...
research
10/02/2018

Heterogeneous Replica for Query on Cassandra

Cassandra is a popular structured storage system with high-performance, ...
research
09/09/2019

General Fragment Model for Information Artifacts

The use of semantic descriptions in data intensive domains require a sys...
research
09/07/2018

Hierarchical Characteristic Set Merging for Optimizing SPARQL Queries in Heterogeneous RDF

Characteristic sets (CS) organize RDF triples based on the set of proper...
research
04/27/2023

Visual Diagrammatic Queries in ViziQuer: Overview and Implementation

Knowledge graphs (KG) have become an important data organization paradig...
research
07/13/2023

Towards a Rosetta Stone for (meta)data: Learning from natural language to improve semantic and cognitive interoperability

In order to effectively manage the overwhelming influx of data, it is cr...

Please sign up or login with your details

Forgot password? Click here to reset