Band Depth based initialization of k-Means for functional data clustering

06/02/2021
by   Javier Albert-Smet, et al.
0

The k-Means algorithm is one of the most popular choices for clustering data but is well-known to be sensitive to the initialization process. There is a substantial number of methods that aim at finding optimal initial seeds for k-Means, though none of them are universally valid. This paper presents an extension to longitudinal data of one of such methods, the BRIk algorithm, that relies on clustering a set of centroids derived from bootstrap replicates of the data and on the use of the versatile Modified Band Depth. In our approach we improve the BRIk method by adding a step where we fit appropriate B-splines to our observations and a resampling process that allows computational feasibility and handling issues such as noise or missing data. Our results with simulated and real data sets indicate that our Functional Data Approach to the BRIK method (FABRIk) is more effective than previous proposals at providing seeds to initialize k-Means in terms of clustering recovery.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2018

An efficient k-means-type algorithm for clustering datasets with incomplete records

The k-means algorithm is the most popular nonparametric clustering metho...
research
04/19/2023

CKmeans and FCKmeans : Two Deterministic Initialization Procedures For Kmeans Algorithm Using Crowding Distance

This paper presents two novel deterministic initialization procedures fo...
research
04/28/2013

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algor...
research
07/23/2020

Scalable Initialization Methods for Large-Scale Clustering

In this work, two new initialization methods for K-means clustering are ...
research
06/06/2020

An Efficient k-modes Algorithm for Clustering Categorical Datasets

Mining clusters from datasets is an important endeavor in many applicati...
research
09/10/2012

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

K-means is undoubtedly the most widely used partitional clustering algor...
research
05/10/2016

An efficient K-means algorithm for Massive Data

Due to the progressive growth of the amount of data available in a wide ...

Please sign up or login with your details

Forgot password? Click here to reset