An Algorithm for the Discovery of Independence from Data

01/07/2021
by   Miika Hannula, et al.
0

For years, independence has been considered as an important concept in many disciplines. Nevertheless, we present the first research that investigates the discovery problem of independence in data. In its arguably simplest form, independence is a statement between two sets of columns expressing that for every two rows in a table there is also a row in the table that coincides with the first row on the first set of columns and with the second row on the second set of columns. We show that the problem of deciding whether there is an independence statement that holds on a given table is not only NP-complete but W[3]-complete in its arguably most natural parameter, namely its arity. We establish the first algorithm to discover all independence statement that hold on a given table. We illustrate in experiments with benchmark data that our algorithm performs well within the limits established by our hardness results. In practice, it is often useful to determine the ratio with which independence statements hold on a given table. For that purpose, we show that our treatment of independence and the design of our algorithm enables us to extend our findings to approximate independence. In our final experiments, we provide some insight into the trade-off between run time and the approximation ratio. Naturally, the smaller the ratio, the more approximate independence statements hold, and the more time it takes to discover all of them. While this research establishes first insight into the computational properties of discovering independence from data, we hope to initiate research into more sophisticated notions of independence, including embedded multivalued dependencies, as well as their context-specific and probabilistic variants.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

Table Enrichment System for Machine Learning

Data scientists are constantly facing the problem of how to improve pred...
research
09/08/2018

Typed Table Transformations

Spreadsheet tables are often labeled, and these labels effectively const...
research
05/09/2012

Logical Inference Algorithms and Matrix Representations for Probabilistic Conditional Independence

Logical inference algorithms for conditional independence (CI) statement...
research
01/30/2018

A Rational Distributed Process-level Account of Independence Judgment

It is inconceivable how chaotic the world would look to humans, faced wi...
research
04/29/2021

Revising Johnson's table for the 21st century

What does it mean today to study a problem from a computational point of...
research
07/11/2012

Stable Independance and Complexity of Representation

The representation of independence relations generally builds upon the w...
research
07/25/2019

Computational Phenotype Discovery via Probabilistic Independence

Computational Phenotype Discovery research has taken various pragmatic a...

Please sign up or login with your details

Forgot password? Click here to reset