Testing Equivalence of Clustering

by   Chao Gao, et al.

In this paper, we test whether two datasets share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two mixtures of multivariate normal distributions. Mean parameters of these normal distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the phase diagram of the testing problem over the entire range of signal-to-noise ratios by providing both lower bounds and tests that achieve them. When nuisance parameters are unknown, we propose tests that achieve the detection boundary adaptively as long as ambient dimensions of the datasets grow at a sub-linear rate with the sample size.



page 15


Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...

Test for mean matrix in GMANOVA model under heteroscedasticity and non-normality for high-dimensional data

This paper is concerned with the testing bilateral linear hypothesis on ...

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

Nonparametric two sample testing deals with the question of consistently...

Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model

In the setting of entangled single-sample distributions, the goal is to ...

Testing Halfspaces over Rotation-Invariant Distributions

We present an algorithm for testing halfspaces over arbitrary, unknown r...

Clustering a Mixture of Gaussians with Unknown Covariance

We investigate a clustering problem with data from a mixture of Gaussian...

A/B Testing Measurement Framework for Recommendation Models Based on Expected Revenue

We provide a method to determine whether a new recommendation system imp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.