Exact log-likelihood for clustering parameterised models and normally distributed data

by   Anthony J. Webster, et al.

Taking a model with equal means in each cluster, the log-likelihood for clustering multivariate normal distributions is calculated. The result has terms to penalise poor fits and model complexity, and determines both the number and composition of clusters. The procedure is equivalent to exactly calculating the Bayesian Information Criterion (BIC), and can produce similar, but less subjective results as the ad-hoc "elbow criterion". An intended application is clustering of fitted models, whose maximum likelihood estimates (MLEs) are normally distributed. Fitted models are often more familiar and interpretable than directly clustered data, can build-in prior knowledge, adjust for known confounders, and can use marginalisation to emphasise parameters of interest. That overall approach is equivalent to a multi-layer clustering algorithm that characterises features through the normally distributed MLE parameters of a fitted model, and then clusters the normal distributions. Alternatively, the results can be applied directly to the means and covariances of (possibly labelled) data.



There are no comments yet.


page 1


Maximum likelihood estimation for matrix normal models via quiver representations

In this paper, we study the log-likelihood function and Maximum Likeliho...

Clustering on the Edge: Learning Structure in Graphs

With the recent popularity of graphical clustering methods, there has be...

Posterior Averaging Information Criterion

We propose a new model selection method, the posterior averaging informa...

A relation between log-likelihood and cross-validation log-scores

It is shown that the log-likelihood of a hypothesis or model given some ...

Grouped Heterogeneous Mixture Modeling for Clustered Data

Clustered data which has a grouping structure (e.g. postal area, school,...

An analysis of the maximum likelihood estimates for the Lomax distribution

The Lomax distribution is a popularly used heavy-tailed distribution tha...

Inverse-Weighted Survival Games

Deep models trained through maximum likelihood have achieved state-of-th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.