Robust Clustering Using Tau-Scales

06/19/2019
by   Juan D. Gonzalez, et al.
0

K means is a popular non-parametric clustering procedure introduced by Steinhaus (1956) and further developed by MacQueen (1967). It is known, however, that K means does not perform well in the presence of outliers. Cuesta-Albertos et al (1997) introduced a robust alternative, trimmed K means, which can be tuned to be robust or efficient, but cannot achieve these two properties simultaneously in an adaptive way. To overcome this limitation we propose a new robust clustering procedure called K Tau Centers, which is based on the concept of Tau scale introduced by Yohai and Zamar (1988). We show that K Tau Centers performs well in extensive simulation studies and real data examples. We also show that the centers found by the proposed method are consistent estimators of the "true" centers defined as the minimizers of the the objective function at the population level.

READ FULL TEXT
research
06/28/2019

Test for parameter change in the presence of outliers: the density power divergence based approach

This study considers the problem of testing for a parameter change in th...
research
11/14/2019

Distributional Clustering: A distribution-preserving clustering method

One key use of k-means clustering is to identify cluster prototypes whic...
research
02/27/2022

Strong Consistency for a Class of Adaptive Clustering Procedures

We introduce a class of clustering procedures which includes k-means and...
research
09/10/2020

Robust Clustering with Normal Mixture Models: A Pseudo β-Likelihood Approach

As in other estimation scenarios, likelihood based estimation in the nor...
research
10/02/2020

Regularized K-means through hard-thresholding

We study a framework of regularized K-means methods based on direct pena...
research
07/18/2022

Population estimation for child care centers through linear regressions

This article arises as an alternative solution to the problem of estimat...
research
06/23/2014

Further heuristics for k-means: The merge-and-split heuristic and the (k,l)-means

Finding the optimal k-means clustering is NP-hard in general and many he...

Please sign up or login with your details

Forgot password? Click here to reset