Scalable Initialization Methods for Large-Scale Clustering

07/23/2020
by   Joonas Hämäläinen, et al.
0

In this work, two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means|| type of an initialization strategy. The second proposal also utilizes multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means|| methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases.

READ FULL TEXT
research
05/30/2016

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes wi...
research
01/10/2020

Probabilistic K-means Clustering via Nonlinear Programming

K-means is a classical clustering algorithm with wide applications. Howe...
research
01/30/2013

An Experimental Comparison of Several Clustering and Initialization Methods

We examine methods for clustering in high dimensions. In the first part ...
research
09/28/2019

A Note On k-Means Probabilistic Poverty

It is proven, by example, that the version of k-means with random initia...
research
01/02/2011

Improving the Performance of K-Means for Color Quantization

Color quantization is an important operation with many applications in g...
research
06/02/2021

Band Depth based initialization of k-Means for functional data clustering

The k-Means algorithm is one of the most popular choices for clustering ...
research
04/19/2023

CKmeans and FCKmeans : Two Deterministic Initialization Procedures For Kmeans Algorithm Using Crowding Distance

This paper presents two novel deterministic initialization procedures fo...

Please sign up or login with your details

Forgot password? Click here to reset