New advances in enumerative biclustering algorithms with online partitioning

03/07/2020
by   Rosana Veroneze, et al.
0

This paper further extends RIn-Close_CVC, a biclustering algorithm capable of performing an efficient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. By avoiding a priori partitioning and itemization of the dataset, RIn-Close_CVC implements an online partitioning, which is demonstrated here to guide to more informative biclustering results. The improved algorithm is called RIn-Close_CVC3, keeps those attractive properties of RIn-Close_CVC, as formally proved here, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime; additional ability to handle datasets with missing values; and additional ability to operate with attributes characterized by distinct distributions or even mixed data types. The experimental results include synthetic and real-world datasets used to perform scalability and sensitivity analyses. As a practical case study, a parsimonious set of relevant and interpretable mixed-attribute-type rules is obtained in the context of supervised descriptive pattern mining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2018

RIn-Close_CVC2: an even more efficient enumerative algorithm for biclustering of numerical datasets

RIn-Close_CVC is an efficient (take polynomial time per bicluster), comp...
research
10/09/2017

Efficient mining of maximal biclusters in mixed-attribute datasets

This paper presents a novel enumerative biclustering algorithm to direct...
research
09/25/2020

Online Missing Value Imputation and Correlation Change Detection for Mixed-type Data via Gaussian Copula

Most data science algorithms require complete observations, yet many dat...
research
11/14/2011

Mining Biclusters of Similar Values with Triadic Concept Analysis

Biclustering numerical data became a popular data-mining task in the beg...
research
08/05/2022

Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach

As one of the most useful online processing techniques, the theta-join o...
research
06/20/2022

Autoencoder-based Attribute Noise Handling Method for Medical Data

Medical datasets are particularly subject to attribute noise, that is, m...
research
11/29/2022

Maximal Atomic irRedundant Sets: a Usage-based Dataflow Partitioning Algorithm

Programs admitting a polyhedral representation can be transformed in man...

Please sign up or login with your details

Forgot password? Click here to reset