Inference of Common Multidimensional Equally-Distributed Attributes

04/20/2021
by   Alejandro Alvarez-Ayllon, et al.
0

Given two relations containing multiple measurements - possibly with uncertainties - our objective is to find which sets of attributes from the first have a corresponding set on the second, using exclusively a sample of the data. This approach could be used even when the associated metadata is damaged, missing or incomplete, or when the volume is too big for exact methods. This problem is similar to the search of Inclusion Dependencies (IND), a type of rule over two relations asserting that for a set of attributes X from the first, every combination of values appears on a set Y from the second. Existing IND can be found exploiting the existence of a partial order relation called specialization. However, this relation is based on set theory, requiring the values to be directly comparable. Statistical tests are an intuitive possible replacement, but it has not been studied how would they affect the underlying assumptions. In this paper we formally review the effect that a statistical approach has over the inference rules applied to IND discovery. Our results confirm the intuitive thought that statistical tests can be used, but not in a directly equivalent manner. We provide a workable alternative based on a "hierarchy of null hypotheses", allowing for the automatic discovery of multi-dimensional equally distributed sets of attributes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2022

Do learned representations respect causal relationships?

Data often has many semantic attributes that are causally associated wit...
research
07/01/2020

TransINT: Embedding Implication Rules in Knowledge Graphs with Isomorphic Intersections of Linear Subspaces

Knowledge Graphs (KG), composed of entities and relations, provide a str...
research
05/19/2021

Testing partial conjunction hypotheses under dependency, with applications to meta-analysis

In many statistical problems the hypotheses are naturally divided into g...
research
01/08/2018

On the consistency of adaptive multiple tests

Much effort has been done to control the "false discovery rate" (FDR) wh...
research
03/24/2021

The Complexity of Dependency Detection and Discovery in Relational Databases

Multi-column dependencies in relational databases come associated with t...
research
06/13/2016

A framework for redescription set construction

Redescription mining is a field of knowledge discovery that aims at find...
research
02/15/2012

The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces

Consider observation data, comprised of n observation vectors with value...

Please sign up or login with your details

Forgot password? Click here to reset