Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

09/12/2014
by   M. Emre Celebi, et al.
0

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2013

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algor...
research
09/10/2012

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

K-means is undoubtedly the most widely used partitional clustering algor...
research
11/27/2019

Adaptive Initialization Method for K-means Algorithm

The K-means algorithm is a widely used clustering algorithm that offers ...
research
11/21/2016

Effective Deterministic Initialization for k-Means-Like Methods via Local Density Peaks Searching

The k-means clustering algorithm is popular but has the following main d...
research
08/26/2019

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

K-Means is one of the most used algorithms for data clustering and the u...
research
01/02/2011

Improving the Performance of K-Means for Color Quantization

Color quantization is an important operation with many applications in g...
research
12/07/2014

A Physically Inspired Clustering Algorithm: to Evolve Like Particles

Clustering analysis is a method to organize raw data into categories bas...

Please sign up or login with your details

Forgot password? Click here to reset