Exact log-likelihood for clustering parameterised models and normally distributed data

08/10/2020
by   Anthony J. Webster, et al.
0

Taking a model with equal means in each cluster, the log-likelihood for clustering multivariate normal distributions is calculated. The result has terms to penalise poor fits and model complexity, and determines both the number and composition of clusters. The procedure is equivalent to exactly calculating the Bayesian Information Criterion (BIC), and can produce similar, but less subjective results as the ad-hoc "elbow criterion". An intended application is clustering of fitted models, whose maximum likelihood estimates (MLEs) are normally distributed. Fitted models are often more familiar and interpretable than directly clustered data, can build-in prior knowledge, adjust for known confounders, and can use marginalisation to emphasise parameters of interest. That overall approach is equivalent to a multi-layer clustering algorithm that characterises features through the normally distributed MLE parameters of a fitted model, and then clusters the normal distributions. Alternatively, the results can be applied directly to the means and covariances of (possibly labelled) data.

READ FULL TEXT
research
07/20/2020

Maximum likelihood estimation for matrix normal models via quiver representations

In this paper, we study the log-likelihood function and Maximum Likeliho...
research
05/05/2016

Clustering on the Edge: Learning Structure in Graphs

With the recent popularity of graphical clustering methods, there has be...
research
09/19/2020

Posterior Averaging Information Criterion

We propose a new model selection method, the posterior averaging informa...
research
08/23/2019

A relation between log-likelihood and cross-validation log-scores

It is shown that the log-likelihood of a hypothesis or model given some ...
research
04/03/2018

Grouped Heterogeneous Mixture Modeling for Clustered Data

Clustered data which has a grouping structure (e.g. postal area, school,...
research
12/07/2022

Network Analysis of Count Data from Mixed Populations

In applications such as gene regulatory network analysis based on single...
research
05/05/2021

A Bayesian latent allocation model for clustering compositional data with application to the Great Barrier Reef

Relative abundance is a common metric to estimate the composition of spe...

Please sign up or login with your details

Forgot password? Click here to reset