Mining Approximate Acyclic Schemes from Relations

11/29/2019
by   Batya Kenig, et al.
0

Acyclic schemes have numerous applications in databases and in machine learning, such as improved design, more efficient storage, and increased performance for queries and machine learning algorithms. Multivalued dependencies (MVDs) are the building blocks of acyclic schemes. The discovery from data of both MVDs and acyclic schemes is more challenging than other forms of data dependencies, such as Functional Dependencies, because these dependencies do not hold on subsets of data, and because they are very sensitive to noise in the data; for example a single wrong or missing tuple may invalidate the schema. In this paper we present Maimon, a system for discovering approximate acyclic schemes and MVDs from data. We give a principled definition of approximation, by using notions from information theory, then describe the two components of Maimon: mining for approximate MVDs, then reconstructing acyclic schemes from approximate MVDs. We conduct an experimental evaluation of Maimon on 20 real-world datasets, and show that it can scale up to 1M rows, and up to 30 columns.

READ FULL TEXT
research
11/24/2022

Estimation of a Causal Directed Acyclic Graph Process using Non-Gaussianity

Numerous approaches have been proposed to discover causal dependencies i...
research
07/06/2023

Querying Data Exchange Settings Beyond Positive Queries

Data exchange, the problem of transferring data from a source schema to ...
research
01/13/2022

Rewriting with Acyclic Queries: Mind your Head

The paper studies the rewriting problem, that is, the decision problem w...
research
08/16/2015

Schema Independent Relational Learning

Learning novel concepts and relations from relational databases is an im...
research
01/06/2021

Efficient Discovery of Approximate Order Dependencies

Order dependencies (ODs) capture relationships between ordered domains o...
research
05/18/2020

Approximate Denial Constraints

The problem of mining integrity constraints from data has been extensive...
research
05/28/2020

Discovering Domain Orders through Order Dependencies

Much real-world data come with explicitly defined domain orders; e.g., l...

Please sign up or login with your details

Forgot password? Click here to reset