Robust Trimmed k-means

08/16/2021
by   Olga Dorabiala, et al.
9

Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing between similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robustify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single- or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers. We also show that RTKM leverages its relative advantages to outperform other methods on multi-membership data containing outliers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2022

ck-means, a novel unsupervised learning method that combines fuzzy and crispy clustering methods to extract intersecting data

Clustering data is a popular feature in the field of unsupervised machin...
research
12/16/2019

A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers

We consider the problem of clustering datasets in the presence of arbitr...
research
11/18/2022

Asymptotics for The k-means

The k-means is one of the most important unsupervised learning technique...
research
06/25/2023

Evolution of K-means solution landscapes with the addition of dataset outliers and a robust clustering comparison measure for their analysis

The K-means algorithm remains one of the most widely-used clustering met...
research
06/30/2021

Robust Coreset for Continuous-and-Bounded Learning (with Outliers)

In this big data era, we often confront large-scale data in many machine...
research
08/19/2019

Robust and Efficient Fuzzy C-Means Clustering Constrained on Flexible Sparsity

Clustering is an effective technique in data mining to group a set of ob...
research
10/13/2020

The intersection of location-allocation and clustering

Location-allocation and partitional spatial clustering both deal with sp...

Please sign up or login with your details

Forgot password? Click here to reset