Discovering Reliable Approximate Functional Dependencies

05/25/2017
by   Panagiotis Mandros, et al.
1

Given a database and a target attribute of interest, how can we tell whether there exists a functional, or approximately functional dependence of the target on any set of other attributes in the data? How can we reliably, without bias to sample size or dimensionality, measure the strength of such a dependence? And, how can we efficiently discover the optimal or α-approximate top-k dependencies? These are exactly the questions we answer in this paper. As we want to be agnostic on the form of the dependence, we adopt an information-theoretic approach, and construct a reliable, bias correcting score that can be efficiently computed. Moreover, we give an effective optimistic estimator of this score, by which for the first time we can mine the approximate functional dependencies from data with guarantees of optimality. Empirical evaluation shows that the derived score achieves a good bias for variance trade-off, can be used within an efficient discovery algorithm, and indeed discovers meaningful dependencies. Most important, it remains reliable in the face of data sparsity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2019

Discovering Reliable Correlations in Categorical Data

In many scientific tasks we are interested in discovering whether there ...
research
05/16/2020

Extending Databases to Support Data Manipulation with Functional Dependencies: a Vision Paper

In the current paper, we propose to fuse together stored data (tables) a...
research
05/04/2019

Learning Functional Dependencies with Sparse Regression

We study the problem of discovering functional dependencies (FD) from a ...
research
01/26/2017

Identifying Consistent Statements about Numerical Data with Dispersion-Corrected Subgroup Discovery

Existing algorithms for subgroup discovery with numerical targets do not...
research
01/06/2021

Efficient Discovery of Approximate Order Dependencies

Order dependencies (ODs) capture relationships between ordered domains o...
research
09/14/2018

Discovering Reliable Dependencies from Data: Hardness and Improved Algorithms

The reliable fraction of information is an attractive score for quantify...
research
10/28/2019

Minimum Detectable Effect Size Computations for Cluster-Level Regression Discontinuity: Quadratic Functional Form and Beyond

Although Cattaneo, Titiunik, and Vazquez-Bare (2019) provides an ex-post...

Please sign up or login with your details

Forgot password? Click here to reset