Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

10/05/2016
by   Erika S. Helgeson, et al.
0

Cluster analysis is an unsupervised learning strategy that can be employed to identify subgroups of observations in data sets of unknown structure. This strategy is particularly useful for analyzing high-dimensional data such as microarray gene expression data. Many clustering methods are available, but it is challenging to determine if the identified clusters represent distinct subgroups. We propose a novel strategy to investigate the significance of identified clusters by comparing the within- cluster sum of squares from the original data to that produced by clustering an appropriate unimodal null distribution. The null distribution we present for this problem uses kernel density estimation and thus does not require that the data follow any particular distribution. We find that our method can accurately test for the presence of clustering even when the number of features is high.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2023

Powerful Significance Testing for Unbalanced Clusters

Clustering methods are popular for revealing structure in data, particul...
research
12/14/2016

Border-Peeling Clustering

In this paper, we present a novel non-parametric clustering technique, w...
research
03/24/2011

A comparison of Gap statistic definitions with and without logarithm function

The Gap statistic is a standard method for determining the number of clu...
research
02/23/2023

Clustering Hierarchies via a Semi-Parametric Generalized Linear Mixed Model: a statistical significance-based approach

We introduce a novel statistical significance-based approach for cluster...
research
04/24/2014

Solution Path Clustering with Adaptive Concave Penalty

Fast accumulation of large amounts of complex data has created a need fo...
research
09/28/2021

An exact test for significance of clusters in binary data

Unsupervised clustering of feature matrix data is an indispensible techn...
research
05/30/2018

U-statistical inference for hierarchical clustering

Clustering methods are a valuable tool for the identification of pattern...

Please sign up or login with your details

Forgot password? Click here to reset