DeepAI AI Chat
Log In Sign Up

An Algorithm for the Discovery of Independence from Data

by   Miika Hannula, et al.
Helsingin yliopisto
The University of Auckland

For years, independence has been considered as an important concept in many disciplines. Nevertheless, we present the first research that investigates the discovery problem of independence in data. In its arguably simplest form, independence is a statement between two sets of columns expressing that for every two rows in a table there is also a row in the table that coincides with the first row on the first set of columns and with the second row on the second set of columns. We show that the problem of deciding whether there is an independence statement that holds on a given table is not only NP-complete but W[3]-complete in its arguably most natural parameter, namely its arity. We establish the first algorithm to discover all independence statement that hold on a given table. We illustrate in experiments with benchmark data that our algorithm performs well within the limits established by our hardness results. In practice, it is often useful to determine the ratio with which independence statements hold on a given table. For that purpose, we show that our treatment of independence and the design of our algorithm enables us to extend our findings to approximate independence. In our final experiments, we provide some insight into the trade-off between run time and the approximation ratio. Naturally, the smaller the ratio, the more approximate independence statements hold, and the more time it takes to discover all of them. While this research establishes first insight into the computational properties of discovering independence from data, we hope to initiate research into more sophisticated notions of independence, including embedded multivalued dependencies, as well as their context-specific and probabilistic variants.


page 1

page 2

page 3

page 4


Table Enrichment System for Machine Learning

Data scientists are constantly facing the problem of how to improve pred...

Typed Table Transformations

Spreadsheet tables are often labeled, and these labels effectively const...

Logical Inference Algorithms and Matrix Representations for Probabilistic Conditional Independence

Logical inference algorithms for conditional independence (CI) statement...

A Rational Distributed Process-level Account of Independence Judgment

It is inconceivable how chaotic the world would look to humans, faced wi...

Revising Johnson's table for the 21st century

What does it mean today to study a problem from a computational point of...

Stable Independance and Complexity of Representation

The representation of independence relations generally builds upon the w...

Computational Phenotype Discovery via Probabilistic Independence

Computational Phenotype Discovery research has taken various pragmatic a...