Partial k-means to avoid outliers, mathematical programming formulations, complexity results

02/11/2023
by   Nicolas Dupin, et al.
0

A well-known bottleneck of Min-Sum-of-Square Clustering (MSSC, the celebrated k-means problem) is to tackle the presence of outliers. In this paper, we propose a Partial clustering variant termed PMSSC which considers a fixed number of outliers to remove. We solve PMSSC by Integer Programming formulations and complexity results extending the ones from MSSC are studied. PMSSC is NP-hard in Euclidean space when the dimension or the number of clusters is greater than 2. Finally, one-dimensional cases are studied: Unweighted PMSSC is polynomial in that case and solved with a dynamic programming algorithm, extending the optimality property of MSSC with interval clustering. This result holds also for unweighted k-medoids with outliers. A weaker optimality property holds for weighted PMSSC, but NP-hardness or not remains an open question in dimension one.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2021

On the Complexity of the Geometric Median Problem with Outliers

In the Geometric Median problem with outliers, we are given a finite set...
research
01/25/2017

Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D

The k-Means clustering problem on n points is NP-Hard for any dimension ...
research
11/24/2020

Min-Sum Clustering (with Outliers)

We give a constant factor polynomial time pseudo-approximation algorithm...
research
04/28/2018

Clustering Perturbation Resilient Instances

Euclidean k-means is a problem that is NP-hard in the worst-case but oft...
research
12/01/2019

On the optimality of kernels for high-dimensional clustering

This paper studies the optimality of kernel methods in high-dimensional ...
research
02/09/2023

Partial Optimality in Cubic Correlation Clustering

The higher-order correlation clustering problem is an expressive model, ...
research
06/25/2023

Evolution of K-means solution landscapes with the addition of dataset outliers and a robust clustering comparison measure for their analysis

The K-means algorithm remains one of the most widely-used clustering met...

Please sign up or login with your details

Forgot password? Click here to reset