Parallelization of Kmeans++ using CUDA

K-means++ is an algorithm which is invented to improve the process of finding initial seeds in K-means algorithm. In this algorithm, initial seeds are chosen consecutively by a probability which is proportional to the distance to the nearest center. The most crucial problem of this algorithm is that when running in serial mode, it decreases the speed of clustering. In this paper, we aim to parallelize the most time consuming steps of the k-means++ algorithm. Our purpose is to reduce the running time while maintaining the quality of the serial algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

A Faster k-means++ Algorithm

K-means++ is an important algorithm to choose initial cluster centers fo...
research
09/19/2023

Worst-Case and Smoothed Analysis of Hartigan's Method for k-Means Clustering

We analyze the running time of Hartigan's method, an old algorithm for t...
research
08/16/2023

A Quantum Approximation Scheme for k-Means

We give a quantum approximation scheme (i.e., (1 + ε)-approximation for ...
research
03/24/2019

Generalization of k-means Related Algorithms

This article briefly introduced Arthur and Vassilvitshii's work on k-mea...
research
09/14/2021

Searching for More Efficient Dynamic Programs

Computational models of human language often involve combinatorial probl...
research
03/15/2019

Tackling Initial Centroid of K-Means with Distance Part (DP-KMeans)

The initial centroid is a fairly challenging problem in the k-means meth...
research
05/10/2016

An efficient K-means algorithm for Massive Data

Due to the progressive growth of the amount of data available in a wide ...

Please sign up or login with your details

Forgot password? Click here to reset