A flexible and robust non-parametric test of exchangeability

09/30/2021
by   Alan J. Aw, et al.
0

Many statistical analyses assume that the data points within a sample are exchangeable and their features have some known dependency structure. Given a feature dependency structure, one can ask if the observations are exchangeable, in which case we say that they are homogeneous. Homogeneity may be the end goal of a clustering algorithm or a justification for not clustering. Apart from random matrix theory approaches, few general approaches provide statistical guarantees of exchangeability or homogeneity without labeled examples from distinct clusters. We propose a fast and flexible non-parametric hypothesis testing approach that takes as input a multivariate individual-by-feature dataset and user-specified feature dependency constraints, without labeled examples, and reports whether the individuals are exchangeable at a user-specified significance level. Our approach controls Type I error across realistic scenarios and handles data of arbitrary dimension. We perform an extensive simulation study to evaluate the efficacy of domain-agnostic tests of stratification, and find that our approach compares favorably in various scenarios of interest. Finally, we apply our approach to post-clustering single-cell chromatin accessibility data and World Values Survey data, and show how it helps to identify drivers of heterogeneity and generate clusters of exchangeable individuals.

READ FULL TEXT

page 3

page 19

page 20

page 39

page 42

research
07/20/2020

Bayesian Non-Parametric Detection Heterogeneity in Ecological Models

Detection heterogeneity is inherent to ecological data, arising from fac...
research
10/24/2022

Post-clustering difference testing: valid inference and practical considerations

Clustering is part of unsupervised analysis methods that consist in grou...
research
03/29/2022

Selective inference for k-means clustering

We consider the problem of testing for a difference in means between clu...
research
07/30/2021

Inference for Dependent Data with Learned Clusters

This paper presents and analyzes an approach to cluster-based inference ...
research
09/26/2017

Adaptive Nonparametric Clustering

This paper presents a new approach to non-parametric cluster analysis ca...
research
05/30/2018

U-statistical inference for hierarchical clustering

Clustering methods are a valuable tool for the identification of pattern...
research
01/19/2011

Transductive-Inductive Cluster Approximation Via Multivariate Chebyshev Inequality

Approximating adequate number of clusters in multidimensional data is an...

Please sign up or login with your details

Forgot password? Click here to reset