An Experimental Comparison of Several Clustering and Initialization Methods

01/30/2013
by   Marina Meila, et al.
0

We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation-Maximization (EM) algorithm, a winner take all version of the EM algorithm reminiscent of the K-means algorithm, and model-based hierarchical agglomerative clustering. We learn naive-Bayes models with a hidden root node, using high-dimensional discrete-variable data sets (both real and synthetic). We find that the EM algorithm significantly outperforms the other methods, and proceed to investigate the effect of various initialization schemes on the final solution produced by the EM algorithm. The initializations that we consider are (1) parameters sampled from an uninformative prior, (2) random perturbations of the marginal distribution of the data, and (3) the output of hierarchical agglomerative clustering. Although the methods are substantially different, they lead to learned models that are strikingly similar in quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2018

An Evolutionary Algorithm with Crossover and Mutation for Model-Based Clustering

The expectation-maximization (EM) algorithm is almost ubiquitous for par...
research
03/25/2016

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

The present work proposes hybridization of Expectation-Maximization (EM)...
research
12/18/2020

A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering

Modern scientific studies often collect data sets in the forms of tensor...
research
07/23/2020

Scalable Initialization Methods for Large-Scale Clustering

In this work, two new initialization methods for K-means clustering are ...
research
01/23/2013

Fast Learning from Sparse Data

We describe two techniques that significantly improve the running time o...
research
06/09/2015

Stagewise Learning for Sparse Clustering of Discretely-Valued Data

The performance of EM in learning mixtures of product distributions ofte...
research
02/06/2013

An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering

Assignment methods are at the heart of many algorithms for unsupervised ...

Please sign up or login with your details

Forgot password? Click here to reset