A Distribution-Free Independence Test for High Dimension Data

10/14/2021
by   Zhanrui Cai, et al.
0

Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a single cell data set to test the independence between two types of single cell sequencing measurements, whose high dimensionality and sparsity make existing methods hard to apply.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2015

A Projection Based Conditional Dependence Measure with Applications to High-dimensional Undirected Graphical Models

Measuring conditional dependence is an important topic in statistics wit...
research
12/18/2022

A Permutation-Free Kernel Independence Test

In nonparametric independence testing, we observe i.i.d. data {(X_i,Y_i)...
research
05/09/2023

Robust Model Selection with Application in Single-Cell Multiomics Data

Model selection is critical in the modern statistics and machine learnin...
research
01/31/2018

A Distribution-Free Test of Independence and Its Application to Variable Selection

Motivated by the importance of measuring the association between the res...
research
02/28/2022

Asymptotic Normality of Gini Correlation in High Dimension with Applications to the K-sample Problem

The categorical Gini correlation proposed by Dang et al. is a dependence...
research
07/11/2022

Testing Independence of Bivariate Censored Data using Random Walk on Restricted Permutation Graph

In this paper, we propose a procedure to test the independence of bivari...
research
02/09/2015

Local and Global Inference for High Dimensional Nonparanormal Graphical Models

This paper proposes a unified framework to quantify local and global inf...

Please sign up or login with your details

Forgot password? Click here to reset