Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

10/21/2015
by   Tomoki Tokuda, et al.
0

We propose a novel method for multiple clustering that assumes a co-clustering structure (partitions in both rows and columns of the data matrix) in each view. The new method is applicable to high-dimensional data. It is based on a nonparametric Bayesian approach in which the number of views and the number of feature-/subject clusters are inferred in a data-driven manner. We simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block. This makes our method applicable to datasets consisting of both numerical and categorical variables, which biomedical data typically do. Clustering solutions are based on variational inference with mean field approximation. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

READ FULL TEXT

page 13

page 14

page 19

page 20

research
06/27/2018

Quantile-based clustering

A new cluster analysis method, K-quantiles clustering, is introduced. K-...
research
09/26/2013

Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering

Semi-supervised clustering is the task of clustering data points into cl...
research
01/13/2020

Conditional Variational Inference with Adaptive Truncation for Bayesian Nonparametric Models

The scalable inference for Bayesian nonparametric models with big data i...
research
02/22/2016

An Effective and Efficient Approach for Clusterability Evaluation

Clustering is an essential data mining tool that aims to discover inhere...
research
08/13/2018

A Nonparametric Bayesian Method for Clustering of High-Dimensional Mixed Dataset

Motivation: Advances in next-generation sequencing (NGS) methods have en...

Please sign up or login with your details

Forgot password? Click here to reset