A Family of Mixture Models for Biclustering

09/10/2020
by   Wangshu Tu, et al.
0

Biclustering is used for simultaneous clustering of the observations and variables when there is no group structure known a priori. It is being increasingly used in bioinformatics, text analytics, etc. Previously, biclustering has been introduced in a model-based clustering framework by utilizing a structure similar to a mixture of factor analyzers. In such models, observed variables 𝐗 are modelled using a latent variable 𝐔 that is assumed to be from N(0, 𝐈). Clustering of variables is introduced by imposing constraints on the entries of the factor loading matrix to be 0 and 1 that results in a block diagonal covariance matrices. However, this approach is overly restrictive as off-diagonal elements in the blocks of the covariance matrices can only be 1 which can lead to unsatisfactory model fit on complex data. Here, the latent variable 𝐔 is assumed to be from a N(0, 𝐓) where 𝐓 is a diagonal matrix. This ensures that the off-diagonal terms in the block matrices within the covariance matrices are non-zero and not restricted to be 1. This leads to a superior model fit on complex data. A family of models are developed by imposing constraints on the components of the covariance matrix. For parameter estimation, an alternating expectation conditional maximization (AECM) algorithm is used. Finally, the proposed method is illustrated using simulated and real datasets.

READ FULL TEXT

page 21

page 22

page 28

research
02/08/2023

Estimation of Gaussian Bi-Clusters with General Block-Diagonal Covariance Matrix and Applications

Bi-clustering is a technique that allows for the simultaneous clustering...
research
11/21/2017

Model-based Clustering with Sparse Covariance Matrices

Finite Gaussian mixture models are widely used for model-based clusterin...
research
01/10/2013

Cross-covariance modelling via DAGs with hidden variables

DAG models with hidden variables present many difficulties that are not ...
research
02/21/2008

Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables

Clustering analysis is one of the most widely used statistical tools in ...
research
01/06/2021

Logistic Normal Multinomial Factor Analyzers for Clustering Microbiome Data

The human microbiome plays an important role in human health and disease...
research
01/14/2020

Sparse Covariance Estimation in Logit Mixture Models

This paper introduces a new data-driven methodology for estimating spars...
research
01/22/2018

A tractable Multi-Partitions Clustering

In the framework of model-based clustering, a model allowing several lat...

Please sign up or login with your details

Forgot password? Click here to reset