Model based clustering of multinomial count data

07/28/2022
by   Panagiotis Papastamoulis, et al.
0

We consider the problem of inferring an unknown number of clusters in replicated multinomial data. Under a model based clustering point of view, this task can be treated by estimating finite mixtures of multinomial distributions with or without covariates. Both Maximum Likelihood (ML) as well as Bayesian estimation are taken into account. Under a Maximum Likelihood approach, we provide an Expectation–Maximization (EM) algorithm which exploits a careful initialization procedure combined with a ridge–stabilized implementation of the Newton–Raphson method in the M–step. Under a Bayesian setup, a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm embedded within a prior parallel tempering scheme is devised. The number of clusters is selected according to the Integrated Completed Likelihood criterion in the ML approach and estimating the number of non-empty components in overfitting mixture models in the Bayesian case. Our method is illustrated in simulated data and applied to two real datasets. An R package is available at https://github.com/mqbssppe/multinomialLogitMix.

READ FULL TEXT

page 13

page 15

page 16

page 22

research
06/02/2019

Clustering Multivariate Data using Factor Analytic Bayesian Mixtures with an Unknown Number of Components

Recent work on overfitting Bayesian mixtures of distributions offers a p...
research
09/24/2014

Unsupervised learning of regression mixture models with unknown number of components

Regression mixture models are widely studied in statistics, machine lear...
research
08/04/2015

Bayesian mixtures of spatial spline regressions

This work relates the framework of model-based clustering for spatial fu...
research
04/29/2022

greed: An R Package for Model-Based Clustering by Greedy Maximization of the Integrated Classification Likelihood

The greed package implements the general and flexible framework of arXiv...
research
10/12/2018

Fast approximate inference for variable selection in Dirichlet process mixtures, with an application to pan-cancer proteomics

The Dirichlet Process (DP) mixture model has become a popular choice for...
research
06/01/2018

Model-based clustering for populations of networks

We propose a model-based clustering method for populations of networks t...
research
08/28/2023

Exploring the likelihood surface in multivariate Gaussian mixtures using Hamiltonian Monte Carlo

Multimodality of the likelihood in Gaussian mixtures is a well-known pro...

Please sign up or login with your details

Forgot password? Click here to reset