Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

07/08/2020
by   Siva Rajesh Kasa, et al.
0

We study two practically important cases of model based clustering using Gaussian Mixture Models: (1) when there is misspecification and (2) on high dimensional data, in the light of recent advances in Gradient Descent (GD) based optimization using Automatic Differentiation (AD). Our simulation studies show that EM has better clustering performance, measured by Adjusted Rand Index, compared to GD in cases of misspecification, whereas on high dimensional data GD outperforms EM. We observe that both with EM and GD there are many solutions with high likelihood but poor cluster interpretation. To address this problem we design a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components. Closed form expressions for the gradients of this penalized likelihood are difficult to derive but AD can be done effortlessly, illustrating the advantage of AD-based optimization. Extensions of this penalty for high dimensional data and for model selection are discussed. Numerical experiments on synthetic and real datasets demonstrate the efficacy of clustering using the proposed penalized likelihood approach.

READ FULL TEXT
research
08/23/2022

Multinomial Cluster-Weighted Models for High-Dimensional Data

Modeling of high-dimensional data is very important to categorize differ...
research
03/12/2019

Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions

Robust clustering of high-dimensional data is an important topic because...
research
09/29/2022

Likelihood adjusted semidefinite programs for clustering heterogeneous data

Clustering is a widely deployed unsupervised learning tool. Model-based ...
research
10/24/2020

Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation

Copulas provide a modular parameterization of multivariate distributions...
research
08/25/2018

Relaxing the Identically Distributed Assumption in Gaussian Co-Clustering for High Dimensional Data

A co-clustering model for continuous data that relaxes the identically d...
research
10/06/2022

Probabilistic partition of unity networks for high-dimensional regression problems

We explore the probabilistic partition of unity network (PPOU-Net) model...

Please sign up or login with your details

Forgot password? Click here to reset