Selective inference after convex clustering with ℓ_1 penalization

09/04/2023
by   François Bachoc, et al.
0

Classical inference methods notoriously fail when applied to data-driven test hypotheses or inference targets. Instead, dedicated methodologies are required to obtain statistical guarantees for these selective inference problems. Selective inference is particularly relevant post-clustering, typically when testing a difference in mean between two clusters. In this paper, we address convex clustering with ℓ_1 penalization, by leveraging related selective inference tools for regression, based on Gaussian vectors conditioned to polyhedral sets. In the one-dimensional case, we prove a polyhedral characterization of obtaining given clusters, than enables us to suggest a test procedure with statistical guarantees. This characterization also allows us to provide a computationally efficient regularization path algorithm. Then, we extend the above test procedure and guarantees to multi-dimensional clustering with ℓ_1 penalization, and also to more general multi-dimensional clusterings that aggregate one-dimensional ones. With various numerical experiments, we validate our statistical guarantees and we demonstrate the power of our methods to detect differences in mean between clusters. Our methods are implemented in the R package poclin.

READ FULL TEXT
research
01/30/2023

Selective inference for clustering with unknown variance

In many modern statistical problems, the limited available data must be ...
research
03/29/2022

Selective inference for k-means clustering

We consider the problem of testing for a difference in means between clu...
research
12/05/2020

Selective Inference for Hierarchical Clustering

Testing for a difference in means between two groups is fundamental to a...
research
07/30/2021

Distribution free optimality intervals for clustering

We address the problem of validating the ouput of clustering algorithms....
research
10/18/2021

Valid and Exact Statistical Inference for Multi-dimensional Multiple Change-Points by Selective Inference

In this paper, we study statistical inference of change-points (CPs) in ...
research
09/24/2018

Equivalence Test in Multi-dimensional Space with Applications in A/B Testing

In this paper, we provide a statistical testing framework to check wheth...
research
12/20/2021

An imprecise-probabilistic characterization of frequentist statistical inference

Between the two dominant schools of thought in statistics, namely, Bayes...

Please sign up or login with your details

Forgot password? Click here to reset