A sampling-based approach for efficient clustering in large datasets

12/29/2021
by   Georgios Exarchakis, et al.
0

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our contribution is substantially more efficient than k-means as it does not require an all to all comparison of data points and clusters. We show that the optimal solutions of our approximation are the same as in the exact solution. However, our approach is considerably more efficient at extracting these clusters compared to the state-of-the-art. We compare our approximation with the exact k-means and alternative approximation approaches on a series of standardised clustering tasks. For the evaluation, we consider the algorithmic complexity, including number of operations to convergence, and the stability of the results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2016

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes wi...
research
12/24/2019

An Entropy-based Variable Feature Weighted Fuzzy k-Means Algorithm for High Dimensional Data

This paper presents a new fuzzy k-means algorithm for the clustering of ...
research
07/11/2022

Fast Density-Peaks Clustering: Multicore-based Parallelization Approach

Clustering multi-dimensional points is a fundamental task in many fields...
research
04/28/2021

A Deep Learning Object Detection Method for an Efficient Clusters Initialization

Clustering is an unsupervised machine learning method grouping data samp...
research
12/04/2014

Iterative Subsampling in Solution Path Clustering of Noisy Big Data

We develop an iterative subsampling approach to improve the computationa...
research
03/19/2019

A Quantum Annealing-Based Approach to Extreme Clustering

In this age of data abundance, there is a growing need for algorithms an...
research
04/08/2018

Dimensionality's Blessing: Clustering Images by Underlying Distribution

Many high dimensional vector distances tend to a constant. This is typic...

Please sign up or login with your details

Forgot password? Click here to reset