K-groups: A Generalization of K-means Clustering

11/12/2017
by   Songzi Li, et al.
0

We propose a new class of distribution-based clustering algorithms, called k-groups, based on energy distance between samples. The energy distance clustering criterion assigns observations to clusters according to a multi-sample energy statistic that measures the distance between distributions. The energy distance determines a consistent test for equality of distributions, and it is based on a population distance that characterizes equality of distributions. The k-groups procedure therefore generalizes the k-means method, which separates clusters that have different means. We propose two k-groups algorithms: k-groups by first variation; and k-groups by second variation. The implementation of k-groups is partly based on Hartigan and Wong's algorithm for k-means. The algorithm is generalized from moving one point on each iteration (first variation) to moving m (m > 1) points. For univariate data, we prove that Hartigan and Wong's k-means algorithm is a special case of k-groups by first variation. The simulation results from univariate and multivariate cases show that our k-groups algorithms perform as well as Hartigan and Wong's k-means algorithm when clusters are well-separated and normally distributed. Moreover, both k-groups algorithms perform better than k-means when data does not have a finite first moment or data has strong skewness and heavy tails. For non--spherical clusters, both k-groups algorithms performed better than k-means in high dimension, and k-groups by first variation is consistent as dimension increases. In a case study on dermatology data with 34 features, both k-groups algorithms performed better than k-means.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2013

Improved Performance of Unsupervised Method by Renovated K-Means

Clustering is a separation of data into groups of similar objects. Every...
research
01/25/2021

Violent Crime in London: An Investigation using Geographically Weighted Regression

Violent crime in London is an area of increasing interest following poli...
research
08/02/2023

Are Easy Data Easy (for K-Means)

This paper investigates the capability of correctly recovering well-sepa...
research
10/23/2020

Detection of groups of concomitant extremes using clustering

There is a growing empirical evidence that the spherical k-means cluster...
research
04/21/2019

TiK-means: K-means clustering for skewed groups

The K-means algorithm is extended to allow for partitioning of skewed gr...
research
05/24/2018

Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering

Standard clustering algorithms usually find regular-structured clusters ...
research
11/17/2020

Peer groups for organisational learning: clustering with practical constraints

Peer-grouping is used in many sectors for organisational learning, polic...

Please sign up or login with your details

Forgot password? Click here to reset