A Bayesian Method for Joint Clustering of Vectorial Data and Network Data

by   Yunchuan Kong, et al.

We present a new model-based integrative method for clustering objects given both vectorial data, which describes the feature of each object, and network data, which indicates the similarity of connected objects. The proposed general model is able to cluster the two types of data simultaneously within one integrative probabilistic model, while traditional methods can only handle one data type or depend on transforming one data type to another. Bayesian inference of the clustering is conducted based on a Markov chain Monte Carlo algorithm. A special case of the general model combining the Gaussian mixture model and the stochastic block model is extensively studied. We used both synthetic data and real data to evaluate this new method and compare it with alternative methods. The results show that our simultaneous clustering method performs much better. This improvement is due to the power of the model-based probabilistic approach for efficiently integrating information.


A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data

High-dimensional data of discrete and skewed nature is commonly encounte...

Bayesian Clustering of Shapes of Curves

Unsupervised clustering of curves according to their shapes is an import...

Bayesian Heterogeneity Pursuit Regression Models for Spatially Dependent Data

Most existing spatial clustering literatures discussed the cluster algor...

Distributed Bayesian clustering using finite mixture of mixtures

In many modern applications, there is interest in analyzing enormous dat...

An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees

Real-world networks often come with side information that can help to im...

Model-based Clustering

Mixture models extend the toolbox of clustering methods available to the...

Model-Based Longitudinal Clustering with Varying Cluster Assignments

It is often of interest to perform clustering on longitudinal data, yet ...