Non-parametric Power-law Data Clustering

06/13/2013
by   Xuhui Fan, et al.
0

It has always been a great challenge for clustering algorithms to automatically determine the cluster numbers according to the distribution of datasets. Several approaches have been proposed to address this issue, including the recent promising work which incorporate Bayesian Nonparametrics into the k-means clustering procedure. This approach shows simplicity in implementation and solidity in theory, while it also provides a feasible way to inference in large scale datasets. However, several problems remains unsolved in this pioneering work, including the power-law data applicability, mechanism to merge centers to avoid the over-fitting problem, clustering order problem, e.t.c.. To address these issues, the Pitman-Yor Process based k-means (namely pyp-means) is proposed in this paper. Taking advantage of the Pitman-Yor Process, pyp-means treats clusters differently by dynamically and adaptively changing the threshold to guarantee the generation of power-law clustering results. Also, one center agglomeration procedure is integrated into the implementation to be able to merge small but close clusters and then adaptively determine the cluster number. With more discussion on the clustering order, the convergence proof, complexity analysis and extension to spectral clustering, our approach is compared with traditional clustering algorithm and variational inference methods. The advantages and properties of pyp-means are validated by experiments on both synthetic datasets and real world datasets.

READ FULL TEXT
research
03/16/2021

K-expectiles clustering

K-means clustering is one of the most widely-used partitioning algorithm...
research
11/22/2022

Global k-means++: an effective relaxation of the global k-means clustering algorithm

The k-means algorithm is a very prevalent clustering method because of i...
research
11/23/2019

A Domain Adaptive Density Clustering Algorithm for Data with Varying Density Distribution

As one type of efficient unsupervised learning methods, clustering algor...
research
02/27/2022

Strong Consistency for a Class of Adaptive Clustering Procedures

We introduce a class of clustering procedures which includes k-means and...
research
02/20/2022

Clustering by the Probability Distributions from Extreme Value Theory

Clustering is an essential task to unsupervised learning. It tries to au...
research
10/09/2021

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

This paper introduces k-splits, an improved hierarchical algorithm based...
research
07/04/2019

k is the Magic Number -- Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

Most convex and nonconvex clustering algorithms come with one crucial pa...

Please sign up or login with your details

Forgot password? Click here to reset