Rectified Euler k-means and Beyond

08/06/2021
by   Yunxia Lin, et al.
0

Euler k-means (EulerK) first maps data onto the unit hyper-sphere surface of equi-dimensional space via a complex mapping which induces the robust Euler kernel and next employs the popular k-means. Consequently, besides enjoying the virtues of k-means such as simplicity and scalability to large data sets, EulerK is also robust to noises and outliers. Although so, the centroids captured by EulerK deviate from the unit hyper-sphere surface and thus in strict distributional sense, actually are outliers. This weird phenomenon also occurs in some generic kernel clustering methods. Intuitively, using such outlier-like centroids should not be quite reasonable but it is still seldom attended. To eliminate the deviation, we propose two Rectified Euler k-means methods, i.e., REK1 and REK2, which retain the merits of EulerK while acquire real centroids residing on the mapped space to better characterize the data structures. Specifically, REK1 rectifies EulerK by imposing the constraint on the centroids while REK2 views each centroid as the mapped image from a pre-image in the original space and optimizes these pre-images in Euler kernel induced space. Undoubtedly, our proposed REKs can methodologically be extended to solve problems of such a category. Finally, the experiments validate the effectiveness of REK1 and REK2.

READ FULL TEXT
research
01/13/2023

The Sphere Formula

The sphere formula states that in an arbitrary finite abstract simplicia...
research
09/04/2019

Theory of high-dimensional outliers

This study concerns the issue of high dimensional outliers which are cha...
research
03/12/2018

Poisson Kernel-Based Clustering on the Sphere: Convergence Properties, Identifiability, and a Method of Sampling

Many applications of interest involve data that can be analyzed as unit ...
research
01/19/2022

Strong error analysis of Euler methods for overdamped generalized Langevin equations with fractional noise: Nonlinear case

This paper considers the strong error analysis of the Euler and fast Eul...
research
06/25/2023

Evolution of K-means solution landscapes with the addition of dataset outliers and a robust clustering comparison measure for their analysis

The K-means algorithm remains one of the most widely-used clustering met...
research
07/12/2021

The Brownian motion in the transformer model

Transformer is the state of the art model for many language and visual t...

Please sign up or login with your details

Forgot password? Click here to reset