Efficient mixture model for clustering of sparse high dimensional binary data

07/11/2017
by   Marek Śmieja, et al.
0

In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on EM algorithm, SparseMix: -is especially designed for the processing of sparse data, -can be efficiently realized by an on-line Hartigan optimization algorithm, -is able to automatically reduce unnecessary clusters. We perform extensive experimental studies on various types of data, which confirm that SparseMix builds partitions with higher compatibility with reference grouping than related methods. Moreover, constructed representatives often better reveal the internal structure of data.

READ FULL TEXT
research
12/05/2019

A sparse negative binomial mixture model for clustering RNA-seq count data

Clustering with variable selection is a challenging but critical task fo...
research
09/21/2023

A mixture of ellipsoidal densities for 3D data modelling

In this paper, we propose a new ellipsoidal mixture model. This model is...
research
07/06/2021

Neural Mixture Models with Expectation-Maximization for End-to-end Deep Clustering

Any clustering algorithm must synchronously learn to model the clusters ...
research
01/11/2016

Temporal Multinomial Mixture for Instance-Oriented Evolutionary Clustering

Evolutionary clustering aims at capturing the temporal evolution of clus...
research
04/15/2021

Heterogeneous Tensor Mixture Models in High Dimensions

We consider the problem of jointly modeling and clustering populations o...
research
03/08/2016

A Bayesian non-parametric method for clustering high-dimensional binary data

In many real life problems, objects are described by large number of bin...
research
03/15/2022

MMES: Mixture Model based Evolution Strategy for Large-Scale Optimization

This work provides an efficient sampling method for the covariance matri...

Please sign up or login with your details

Forgot password? Click here to reset