Testing Equivalence of Clustering

10/28/2019
by   Chao Gao, et al.
0

In this paper, we test whether two datasets share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two mixtures of multivariate normal distributions. Mean parameters of these normal distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the phase diagram of the testing problem over the entire range of signal-to-noise ratios by providing both lower bounds and tests that achieve them. When nuisance parameters are unknown, we propose tests that achieve the detection boundary adaptively as long as ambient dimensions of the datasets grow at a sub-linear rate with the sample size.

READ FULL TEXT

Authors

page 15

09/22/2021

Sparse Uniformity Testing

In this paper we consider the uniformity testing problem for high-dimens...
08/11/2020

Test for mean matrix in GMANOVA model under heteroscedasticity and non-normality for high-dimensional data

This paper is concerned with the testing bilateral linear hypothesis on ...
11/23/2014

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

Nonparametric two sample testing deals with the question of consistently...
07/10/2020

Learning Entangled Single-Sample Gaussians in the Subset-of-Signals Model

In the setting of entangled single-sample distributions, the goal is to ...
10/31/2018

Testing Halfspaces over Rotation-Invariant Distributions

We present an algorithm for testing halfspaces over arbitrary, unknown r...
10/04/2021

Clustering a Mixture of Gaussians with Unknown Covariance

We investigate a clustering problem with data from a mixture of Gaussian...
06/14/2019

A/B Testing Measurement Framework for Recommendation Models Based on Expected Revenue

We provide a method to determine whether a new recommendation system imp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.