DeepAI AI Chat
Log In Sign Up

Scalable Initialization Methods for Large-Scale Clustering

07/23/2020
by   Joonas Hämäläinen, et al.
Jyväskylän yliopisto
0

In this work, two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means|| type of an initialization strategy. The second proposal also utilizes multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means|| methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases.

READ FULL TEXT
05/30/2016

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes wi...
01/10/2020

Probabilistic K-means Clustering via Nonlinear Programming

K-means is a classical clustering algorithm with wide applications. Howe...
01/30/2013

An Experimental Comparison of Several Clustering and Initialization Methods

We examine methods for clustering in high dimensions. In the first part ...
09/28/2019

A Note On k-Means Probabilistic Poverty

It is proven, by example, that the version of k-means with random initia...
01/02/2011

Improving the Performance of K-Means for Color Quantization

Color quantization is an important operation with many applications in g...
06/02/2021

Band Depth based initialization of k-Means for functional data clustering

The k-Means algorithm is one of the most popular choices for clustering ...
04/19/2023

CKmeans and FCKmeans : Two Deterministic Initialization Procedures For Kmeans Algorithm Using Crowding Distance

This paper presents two novel deterministic initialization procedures fo...

Code Repositories

M_Spheres_Dataset_Generator

MATLAB implementation of the M-Spheres Dataset Generator.


view repo

Scalable-K-means

Parallel MATLAB implementations of the K-means clustering methods from the paper Hämäläinen et al. "Scalable Initialization Methods for Large-Scale Clustering".


view repo