Two edge-count tests and relevance analysis in k high-dimensional samples

by   Xiaoping Shi, et al.

For the task of relevance analysis, the conventional Tukey's test may be applied to the set of all pairwise comparisons. However, there were few studies that discuss both nonparametric k-sample comparisons and relevance analysis in high dimensions. Our aim is to capture the degree of relevance between combined samples and provide additional insights and advantages in high-dimensional k-sample comparisons. Our solution is to extend a graph-based two-sample comparison and investigate its availability for large and unequal sample sizes. We propose two distribution-free test statistics based on between-sample edge counts and measure the degree of relevance by standardized counts. The asymptotic permutation null distributions of the proposed statistics are derived, and the power gain is proved when the sample sizes are smaller than the square root of the dimension. We also discuss different edge costs in the graph to compare the parameters of the distributions. Simulation comparisons and real data analysis of tumors and images further convince the value of our proposed method. Software implementing the relevance analysis is available in the R package Relevance.


page 13

page 23


Two-Sample Test for Sparse High Dimensional Multinomial Distributions

In this paper we consider testing the equality of probability vectors of...

Modified Pillai's trace statistics for two high-dimensional sample covariance matrices

The goal of this study was to test the equality of two covariance matric...

Finite Sample t-Tests for High-Dimensional Means

Size distortion can occur if an asymptotic testing procedure requiring d...

A Robust Framework for Graph-based Two-Sample Tests Using Weights

Graph-based tests are a class of non-parametric two-sample tests useful ...

On Steel's Test with Ties

This note revisits Steel's multiple comparison test which uses Wilcoxon ...

Nonparametric High-dimensional K-sample Comparison

High-dimensional k-sample comparison is a common applied problem. We con...

Bayesian structural learning of microbiota systems from count metagenomic data

Metagenomics combined with high-resolution sequencing techniques have en...

Please sign up or login with your details

Forgot password? Click here to reset