An efficient k-means-type algorithm for clustering datasets with incomplete records

02/23/2018
by   Andrew Lithio, et al.
0

The k-means algorithm is the most popular nonparametric clustering method in use, but cannot generally be applied to data sets with missing observations. The usual practice with such data sets is to either impute the values under an assumption of a missing-at-random mechanism or to ignore the incomplete records, and then to use the desired clustering method. We develop an efficient version of the k-means algorithm that allows for clustering cases where not all the features have observations recorded. Our extension is called k_m-means and reduces to the k-means algorithm when all records are complete. We also provide strategies to initialize our algorithm and to estimate the number of groups in the data set. Illustrations and simulations demonstrate the efficacy of our approach in a variety of settings and patterns of missing data. Our methods are also applied to the clustering of gamma-ray bursts and to the analysis of activation images obtained from a functional Magnetic Resonance Imaging experiment.

READ FULL TEXT

page 11

page 12

page 17

page 18

page 21

page 22

research
07/09/2020

Modified Possibilistic Fuzzy C-Means Algorithm for Clustering Incomplete Data Sets

Possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm has...
research
06/02/2021

Band Depth based initialization of k-Means for functional data clustering

The k-Means algorithm is one of the most popular choices for clustering ...
research
05/24/2018

Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering

Standard clustering algorithms usually find regular-structured clusters ...
research
12/10/2020

Cluster analysis and outlier detection with missing data

A mixture of multivariate contaminated normal (MCN) distributions is a u...
research
04/05/2019

k-means clustering of extremes

The k-means clustering algorithm and its variant, the spherical k-means ...
research
04/21/2019

TiK-means: K-means clustering for skewed groups

The K-means algorithm is extended to allow for partitioning of skewed gr...
research
12/10/2020

Clustering multivariate functional data using unsupervised binary trees

We propose a model-based clustering algorithm for a general class of fun...

Please sign up or login with your details

Forgot password? Click here to reset