Regularization and Global Optimization in Model-Based Clustering

02/05/2023
by   Raphael Araujo Sampaio, et al.
0

Due to their conceptual simplicity, k-means algorithm variants have been extensively used for unsupervised cluster analysis. However, one main shortcoming of these algorithms is that they essentially fit a mixture of identical spherical Gaussians to data that vastly deviates from such a distribution. In comparison, general Gaussian Mixture Models (GMMs) can fit richer structures but require estimating a quadratic number of parameters per cluster to represent the covariance matrices. This poses two main issues: (i) the underlying optimization problems are challenging due to their larger number of local minima, and (ii) their solutions can overfit the data. In this work, we design search strategies that circumvent both issues. We develop efficient global optimization algorithms for general GMMs, and we combine these algorithms with regularization strategies that avoid overfitting. Through extensive computational analyses, we observe that global optimization or regularization in isolation does not substantially improve cluster recovery. However, combining these techniques permits a completely new level of performance previously unachieved by k-means algorithm variants, unraveling vastly different cluster structures. These results shed new light on the current status quo between GMM and k-means methods and suggest the more frequent use of general GMMs for data exploration. To facilitate such applications, we provide open-source code as well as Julia packages ("UnsupervisedClustering.jl" and "RegularizedCovarianceMatrices.jl") implementing the proposed techniques.

READ FULL TEXT

page 16

page 19

page 21

page 23

research
01/05/2020

Cutoff for exact recovery of Gaussian mixture models

We determine the cutoff value on separation of cluster centers for exact...
research
01/14/2021

Optimal Clustering in Anisotropic Gaussian Mixture Models

We study the clustering task under anisotropic Gaussian Mixture Models w...
research
09/13/2022

Addressing overfitting in spectral clustering via a non-parametric bootstrap

Finite mixture modelling is a popular method in the field of clustering ...
research
11/18/2020

Surrogate modeling approximation using a mixture of experts based on EM joint estimation

An automatic method to combine several local surrogate models is present...
research
04/01/2013

Splitting Methods for Convex Clustering

Clustering is a fundamental problem in many scientific applications. Sta...
research
12/31/2021

Towards the global vision of engagement of Generation Z at the workplace: Mathematical modeling

Correlation and cluster analyses (k-Means, Gaussian Mixture Models) were...
research
10/16/2017

When Do Birds of a Feather Flock Together? K-Means, Proximity, and Conic Programming

Given a set of data, one central goal is to group them into clusters bas...

Please sign up or login with your details

Forgot password? Click here to reset