QTrail-DB: A Query Processing Engine for Imperfect Databases with Evolving Qualities

03/12/2023
by   Maha Asiri, et al.
0

Imperfect databases are very common in many applications due to various reasons ranging from data-entry errors, transmission or integration errors, and wrong instruments' readings, to faulty experimental setups leading to incorrect results. The management and query processing of imperfect databases is a very challenging problem as it requires incorporating the data's qualities within the database engine. Even more challenging, the qualities are typically not static and may evolve over time. Unfortunately, most of the state-of-art techniques deal with the data quality problem as an offline task that is in total isolation of the query processing engine (carried out outside the DBMS). Hence, end-users will receive the queries' results with no clue on whether or not the results can be trusted for further analysis and decision making. In this paper, we propose the it "QTrail-DB" system that fundamentally extends the standard DBMSs to support imperfect databases with evolving qualities. QTrail-DB introduces a new quality model based on the new concept of "Quality Trails", which captures the evolution of the data's qualities over time. QTrail-DB extends the relational data model to incorporate the quality trails within the database system. We propose a new query algebra, called "QTrail Algebra", that enables seamless and transparent propagation and derivations of the data's qualities within a query pipeline. As a result, a query's answer will be automatically annotated with quality-related information at the tuple level. QTrail-DB propagation model leverages the thoroughly-studied propagation semantics present in the DB provenance and lineage tracking literature, and thus there is no need for developing a new query optimizer. QTrail-DB is developed within PostgreSQL and experimentally evaluated using real-world datasets to demonstrate its efficiency and practicality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2020

Optimizing Federated Queries Based on the Physical Design of a Data Lake

The optimization of query execution plans is known to be crucial for red...
research
08/28/2017

T-DB: Toward Fully Functional Transparent Encrypted Databases in DBaaS Framework

Individuals and organizations tend to migrate their data to clouds, espe...
research
03/26/2018

SIG-DB: leveraging homomorphic encryption to Securely Interrogate privately held Genomic DataBases

Genomic data are becoming increasingly valuable as we develop methods to...
research
08/16/2018

Automatic Generation of a Hybrid Query Execution Engine

The ever-increasing need for fast data processing demands new methods fo...
research
11/25/2019

Managing Variability in Relational Databases by VDBMS

Variability inherently exists in databases in various contexts which cre...
research
03/29/2019

From DB-nets to Coloured Petri Nets with Priorities (Extended Version)

The recently introduced formalism of DB-nets has brought in a new concep...
research
10/24/2022

Understanding Inconsistency in Azure Cosmos DB with TLA+

Beyond implementation correctness of a distributed system, it is equally...

Please sign up or login with your details

Forgot password? Click here to reset