Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

04/28/2013
by   M. Emre Celebi, et al.
0

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. Many of these methods, however, have superlinear complexity in the number of data points, making them impractical for large data sets. On the other hand, linear methods are often random and/or order-sensitive, which renders their results unrepeatable. Recently, Su and Dy proposed two highly successful hierarchical initialization methods named Var-Part and PCA-Part that are not only linear, but also deterministic (non-random) and order-invariant. In this paper, we propose a discriminant analysis based approach that addresses a common deficiency of these two methods. Experiments on a large and diverse collection of data sets from the UCI Machine Learning Repository demonstrate that Var-Part and PCA-Part are highly competitive with one of the best random initialization methods to date, i.e., k-means++, and that the proposed approach significantly improves the performance of both hierarchical methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2014

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm ...
research
09/10/2012

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

K-means is undoubtedly the most widely used partitional clustering algor...
research
04/19/2023

CKmeans and FCKmeans : Two Deterministic Initialization Procedures For Kmeans Algorithm Using Crowding Distance

This paper presents two novel deterministic initialization procedures fo...
research
06/02/2021

Band Depth based initialization of k-Means for functional data clustering

The k-Means algorithm is one of the most popular choices for clustering ...
research
11/27/2019

Adaptive Initialization Method for K-means Algorithm

The K-means algorithm is a widely used clustering algorithm that offers ...
research
08/26/2019

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

K-Means is one of the most used algorithms for data clustering and the u...
research
10/24/2019

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Speaker diarization based on bottom-up clustering of speech segments by ...

Please sign up or login with your details

Forgot password? Click here to reset