Discovery of Paradigm Dependencies

10/08/2017
by   Jizhou Sun, et al.
0

Missing and incorrect values often cause serious consequences. To deal with these data quality problems, a class of common employed tools are dependency rules, such as Functional Dependencies (FDs), Conditional Functional Dependencies (CFDs) and Edition Rules (ERs), etc. The stronger expressing ability a dependency has, data with the better quality can be obtained. To the best of our knowledge, all previous dependencies treat each attribute value as a non-splittable whole. Actually however, in many applications, part of a value may contains meaningful information, indicating that more powerful dependency rules to handle data quality problems are possible. In this paper, we consider of discovering such type of dependencies in which the left hand side is part of a regular-expression-like paradigm, named Paradigm Dependencies (PDs). PDs tell that if a string matches the paradigm, element at the specified position can decides a certain other attribute's value. We propose a framework in which strings with similar coding rules and different lengths are clustered together and aligned vertically, from which PDs can be discovered directly. The aligning problem is the key component of this framework and is proved in NP-Complete. A greedy algorithm is introduced in which the clustering and aligning tasks can be accomplished simultaneously. Because of the greedy algorithm's high time complexity, several pruning strategies are proposed to reduce the running time. In the experimental study, three real datasets as well as several synthetical datasets are employed to verify our methods' effectiveness and efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2021

Discovery and Contextual Data Cleaning with Ontology Functional Dependencies

Functional Dependencies (FDs) define attribute relationships based on sy...
research
08/19/2021

Temporal Graph Functional Dependencies: Technical Report

Data dependencies have been extended to graphs e.g., graph functional de...
research
09/27/2019

Possible/Certain Functional Dependencies

Incomplete information allow to deal with data with errors, uncertainty ...
research
01/16/2023

An Efficient Approach for Discovering Graph Entity Dependencies (GEDs)

Graph entity dependencies (GEDs) are novel graph constraints, unifying k...
research
01/24/2019

Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction

We study the problem of k-center clustering with outliers in arbitrary m...
research
03/12/2019

Distributed Dependency Discovery

We analyze the problem of discovering dependencies from distributed big ...
research
09/29/2020

The Shapley Value of Inconsistency Measures for Functional Dependencies

Quantifying the inconsistency of a database is motivated by various goal...

Please sign up or login with your details

Forgot password? Click here to reset