t-k-means: A k-means Variant with Robustness and Stability

07/17/2019
by   Yang Zhang, et al.
16

Lloyd's k-means algorithm is one of the most classical clustering method, which is widely used in data mining or as a data pre-processing procedure. However, due to the thin-tailed property of the Gaussian distribution, k-means suffers from relatively poor performance on the heavy-tailed data or outliers. In addition, k-means have a relatively weak stability, i.e. its result has a large variance, which reduces the credibility of the model. In this paper, we propose a robust and stable k-means variant, the t-k-means, as well as its fast version in solving the flat clustering problem. Theoretically, we detail the derivations of t-k-means and analyze its robustness and stability from the aspect of loss function, influence function and the expression of clustering center. A large number of experiments are conducted, which empirically demonstrates that our method has empirical soundness while preserving running efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2022

Wilcoxon-type Multivariate Cluster Elastic Net

We propose a method for high dimensional multivariate regression that is...
research
06/16/2020

Robust compressed sensing of generative models

The goal of compressed sensing is to estimate a high dimensional vector ...
research
01/29/2018

A notion of stability for k-means clustering

In this paper, we define and study a new notion of stability for the k-m...
research
01/31/2019

Generalized Dirichlet-process-means for f-separable distortion measures

DP-means clustering was obtained as an extension of K-means clustering. ...
research
01/10/2020

Probabilistic K-means Clustering via Nonlinear Programming

K-means is a classical clustering algorithm with wide applications. Howe...
research
05/07/2023

Influence of Swarm Intelligence in Data Clustering Mechanisms

Data mining focuses on discovering interesting, non-trivial and meaningf...
research
10/08/2016

Boost K-Means

Due to its simplicity and versatility, k-means remains popular since it ...

Please sign up or login with your details

Forgot password? Click here to reset