Improved Clustering with Augmented k-means

05/22/2017
by   J. Andrew Howe, et al.
0

Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can't be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

Asymptotics for The k-means

The k-means is one of the most important unsupervised learning technique...
research
12/23/2017

Merging K-means with hierarchical clustering for identifying general-shaped groups

Clustering partitions a dataset such that observations placed together i...
research
07/01/2013

Semi-supervised clustering methods

Cluster analysis methods seek to partition a data set into homogeneous s...
research
10/24/2018

A Binary Optimization Approach for Constrained K-Means Clustering

K-Means clustering still plays an important role in many computer vision...
research
12/02/2020

IBM Employee Attrition Analysis

In this paper, we analyzed the dataset IBM Employee Attrition to find th...
research
10/27/2021

Learning-Augmented k-means Clustering

k-means clustering is a well-studied problem due to its wide applicabili...
research
06/10/2019

Multiway clustering via tensor block models

We consider the problem of identifying multiway block structure from a l...

Please sign up or login with your details

Forgot password? Click here to reset