Clustering inference in multiple groups

06/16/2021
by   Debora Zava Bello, et al.
0

Inference in clustering is paramount to uncovering inherent group structure in data. Clustering methods which assess statistical significance have recently drawn attention owing to their importance for the identification of patterns in high dimensional data with applications in many scientific fields. We present here a U-statistics based approach, specially tailored for high-dimensional data, that clusters the data into three groups while assessing the significance of such partitions. Because our approach stands on the U-statistics based clustering framework of the methods in R package uclust, it inherits its characteristics being a non-parametric method relying on very few assumptions about the data, and thus can be applied to a wide range of dataset. Furthermore our method aims to be a more powerful tool to find the best partitions of the data into three groups when that particular structure is present. In order to do so, we first propose an extension of the test U-statistic and develop its asymptotic theory. Additionally we propose a ternary non-nested significance clustering method. Our approach is tested through multiple simulations and found to have more statistical power than competing alternatives in all scenarios considered. Applications to peripheral blood mononuclear cells and to image recognition shows the versatility of our proposal, presenting a superior performance when compared with other approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2018

U-statistical inference for hierarchical clustering

Clustering methods are a valuable tool for the identification of pattern...
research
11/17/2020

Peer groups for organisational learning: clustering with practical constraints

Peer-grouping is used in many sectors for organisational learning, polic...
research
05/26/2021

An algorithm-based multiple detection influence measure for high dimensional regression using expectile

The identification of influential observations is an important part of d...
research
01/10/2020

A Complex Networks Approach to Find Latent Clusters of Terrorist Groups

Given the extreme heterogeneity of actors and groups participating in te...
research
04/30/2023

A new clustering framework

Detection of clusters is a crucial task across many disciplines such as ...
research
02/22/2016

An Effective and Efficient Approach for Clusterability Evaluation

Clustering is an essential data mining tool that aims to discover inhere...
research
02/20/2018

How to analyze data in a factorial design? An extensive simulation study

Factorial designs are frequently used in different fields of science, e....

Please sign up or login with your details

Forgot password? Click here to reset