Delta-Closure Structure for Studying Data Distribution
In this paper, we revisit pattern mining and study the distribution underlying a binary dataset thanks to the closure structure which is based on passkeys, i.e., minimum generators in equivalence classes robust to noise. We introduce Δ-closedness, a generalization of the closure operator, where Δ measures how a closed set differs from its upper neighbors in the partial order induced by closure. A Δ-class of equivalence includes minimum and maximum elements and allows us to characterize the distribution underlying the data. Moreover, the set of Δ-classes of equivalence can be partitioned into the so-called Δ-closure structure. In particular, a Δ-class of equivalence with a high level demonstrates correlations among many attributes, which are supported by more observations when Δ is large. In the experiments, we study the Δ-closure structure of several real-world datasets and show that this structure is very stable for large Δ and does not substantially depend on the data sampling used for the analysis.
READ FULL TEXT