K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

02/10/2020
by   Camille Brunet-Saumard, et al.
0

We propose a new clustering algorithm that is robust to the presence of outliers in the dataset. We perform Lloyd-type iterations with robust estimates of the centroids. More precisely, we build on the idea of median-of-means statistics to estimate the centroids, but allow for replacement while constructing the blocks. We call this methodology the bootstrap median-of-means (bMOM) and prove that if enough blocks are generated through the bootstrap sampling, then it has a better breakdown point for mean estimation than the classical median-of-means (MOM), where the blocks form a partition of the dataset. From a clustering perspective, bMOM enables to take many blocks of a desired size, thus avoiding possible disappearance of clusters in some blocks, a pitfall that can occur for the partition-based generation of blocks of the classical median-of-means. Experiments on simulated datasets show that the proposed approach, called K-bMOM, performs better than existing robust K-means based methods. It is also recommended to the practitionner to use such a robust approach to initialize their clustering algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2012

A robust and sparse K-means clustering algorithm

In many situations where the interest lies in identifying clusters one m...
research
07/02/2019

A flexible EM-like clustering algorithm for noisy data

We design a new robust clustering algorithm that can deal efficiently wi...
research
11/15/2022

A robust model-based clustering based on the geometric median and the Median Covariation Matrix

Grouping observations into homogeneous groups is a recurrent task in sta...
research
03/08/2022

The Flag Median and FlagIRLS

Finding prototypes (e.g., mean and median) for a dataset is central to a...
research
08/18/2023

Do you know what q-means?

Clustering is one of the most important tools for analysis of large data...
research
03/18/2022

Statistical analysis of a hierarchical clustering algorithm with outliers

It is well known that the classical single linkage algorithm usually fai...
research
05/28/2021

DeepMoM: Robust Deep Learning With Median-of-Means

Data used in deep learning is notoriously problematic. For example, data...

Please sign up or login with your details

Forgot password? Click here to reset