An epistemic approach to model uncertainty in data-graphs

09/29/2021
by   Sergio Abriola, et al.
0

Graph databases are becoming widely successful as data models that allow to effectively represent and process complex relationships among various types of data. As with any other type of data repository, graph databases may suffer from errors and discrepancies with respect to the real-world data they intend to represent. In this work we explore the notion of probabilistic unclean graph databases, previously proposed for relational databases, in order to capture the idea that the observed (unclean) graph database is actually the noisy version of a clean one that correctly models the world but that we know partially. As the factors that may be involved in the observation can be many, e.g, all different types of clerical errors or unintended transformations of the data, we assume a probabilistic model that describes the distribution over all possible ways in which the clean (uncertain) database could have been polluted. Based on this model we define two computational problems: data cleaning and probabilistic query answering and study for both of them their corresponding complexity when considering that the transformation of the database can be caused by either removing (subset) or adding (superset) nodes and edges.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2020

Learning Over Dirty Data Without Cleaning

Real-world datasets are dirty and contain many errors. Examples of these...
research
01/21/2018

A Formal Framework For Probabilistic Unclean Databases

Traditional modeling of inconsistency in database theory casts all possi...
research
04/03/2023

Data-graph repairs: the preferred approach

Repairing inconsistent knowledge bases is a task that has been assessed,...
research
09/01/2021

Quantum-Inspired Keyword Search on Multi-Model Databases

With the rising applications implemented in different domains, it is ine...
research
06/15/2022

On the complexity of finding set repairs for data-graphs

In the deeply interconnected world we live in, pieces of information lin...
research
06/17/2021

A probabilistic database approach to autoencoder-based data cleaning

Data quality problems are a large threat in data science. In this paper,...
research
04/11/2023

Probabilistic Reasoning at Scale: Trigger Graphs to the Rescue

The role of uncertainty in data management has become more prominent tha...

Please sign up or login with your details

Forgot password? Click here to reset