t-k-means: A k-means Variant with Robustness and Stability

07/17/2019
by   Yang Zhang, et al.
16

Lloyd's k-means algorithm is one of the most classical clustering method, which is widely used in data mining or as a data pre-processing procedure. However, due to the thin-tailed property of the Gaussian distribution, k-means suffers from relatively poor performance on the heavy-tailed data or outliers. In addition, k-means have a relatively weak stability, i.e. its result has a large variance, which reduces the credibility of the model. In this paper, we propose a robust and stable k-means variant, the t-k-means, as well as its fast version in solving the flat clustering problem. Theoretically, we detail the derivations of t-k-means and analyze its robustness and stability from the aspect of loss function, influence function and the expression of clustering center. A large number of experiments are conducted, which empirically demonstrates that our method has empirical soundness while preserving running efficiency.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset