Effective Deterministic Initialization for k-Means-Like Methods via Local Density Peaks Searching

11/21/2016
by   Fengfu Li, et al.
0

The k-means clustering algorithm is popular but has the following main drawbacks: 1) the number of clusters, k, needs to be provided by the user in advance, 2) it can easily reach local minima with randomly selected initial centers, 3) it is sensitive to outliers, and 4) it can only deal with well separated hyperspherical clusters. In this paper, we propose a Local Density Peaks Searching (LDPS) initialization framework to address these issues. The LDPS framework includes two basic components: one of them is the local density that characterizes the density distribution of a data set, and the other is the local distinctiveness index (LDI) which we introduce to characterize how distinctive a data point is compared with its neighbors. Based on these two components, we search for the local density peaks which are characterized with high local densities and high LDIs to deal with 1) and 2). Moreover, we detect outliers characterized with low local densities but high LDIs, and exclude them out before clustering begins. Finally, we apply the LDPS initialization framework to k-medoids, which is a variant of k-means and chooses data samples as centers, with diverse similarity measures other than the Euclidean distance to fix the last drawback of k-means. Combining the LDPS initialization framework with k-means and k-medoids, we obtain two novel clustering methods called LDPS-means and LDPS-medoids, respectively. Experiments on synthetic data sets verify the effectiveness of the proposed methods, especially when the ground truth of the cluster number k is large. Further, experiments on several real world data sets, Handwritten Pendigits, Coil-20, Coil-100 and Olivetti Face Database, illustrate that our methods give a superior performance than the analogous approaches on both estimating k and unsupervised object categorization.

READ FULL TEXT

page 4

page 13

page 14

research
11/19/2018

An efficient density-based clustering algorithm using reverse nearest neighbour

Density-based clustering is the task of discovering high-density regions...
research
01/31/2023

Archetypal Analysis++: Rethinking the Initialization Strategy

Archetypal analysis is a matrix factorization method with convexity cons...
research
10/18/2022

An enhanced method of initial cluster center selection for K-means algorithm

Clustering is one of the widely used techniques to find out patterns fro...
research
02/14/2023

Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

K-Means algorithm is a popular clustering method. However, it has two li...
research
01/13/2022

A Geometric Approach to k-means

k-means clustering is a fundamental problem in various disciplines. This...
research
10/14/2019

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

As one of the most ubiquitously applied unsupervised learning methods, c...
research
09/12/2014

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm ...

Please sign up or login with your details

Forgot password? Click here to reset