Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

02/14/2023
by   Dong Li, et al.
0

K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters k has to be given a priori. To solve these two issues, a multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented. First, based on the structure of the spurious local minima of the K-Means problem, a multi-prototypes sampling (MPS) is designed to select the appropriate number of multi-prototypes for data with arbitrary shapes. A theoretical proof is given to guarantee that the multi-prototypes selected by MPS can achieve a constant factor approximation to the optimal cost of the K-Means problem. Then, a merging technique, called convex merging (CM), merges the multi-prototypes to get a better local minima without k being given a priori. Specifically, CM can obtain the optimal merging and estimate the correct k. By integrating these two techniques with K-Means algorithm, the proposed MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given k first. Experimental results performed on synthetic and real-world data sets have verified the effectiveness of the proposed algorithm.

READ FULL TEXT

page 7

page 9

research
09/06/2011

An Automatic Clustering Technique for Optimal Clusters

This paper proposes a simple, automatic and efficient clustering algorit...
research
01/01/2019

Clustering with Distributed Data

We consider K-means clustering in networked environments (e.g., internet...
research
11/21/2016

Effective Deterministic Initialization for k-Means-Like Methods via Local Density Peaks Searching

The k-means clustering algorithm is popular but has the following main d...
research
02/24/2020

Clustering and Classification with Non-Existence Attributes: A Sentenced Discrepancy Measure Based Technique

For some or all of the data instances a number of independent-world clus...
research
12/11/2014

A Novel Adaptive Possibilistic Clustering Algorithm

In this paper a novel possibilistic c-means clustering algorithm, called...
research
01/22/2018

An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

DBSCAN is a typically used clustering algorithm due to its clustering ab...
research
08/25/2015

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

In this paper, we propose a model-based clustering method (TVClust) that...

Please sign up or login with your details

Forgot password? Click here to reset