A Tracy-Widom Empirical Estimator For Valid P-values With High-Dimensional Datasets

11/18/2018
by   Maxime Turgeon, et al.
0

Recent technological advances in many domains including both genomics and brain imaging have led to an abundance of high-dimensional and correlated data being routinely collected. Classical multivariate approaches like Multivariate Analysis of Variance (MANOVA) and Canonical Correlation Analysis (CCA) can be used to study relationships between such multivariate datasets. Yet, special care is required with high-dimensional data, as the test statistics may be ill-defined and classical inference procedures break down. In this work, we explain how valid p-values can be derived for these multivariate methods even in high dimensional datasets. Our main contribution is an empirical estimator for the largest root distribution of a singular double Wishart problem; this general framework underlies many common multivariate analysis approaches. From a small number of permutations of the data, we estimate the location and scale parameters of a parametric Tracy-Widom family that provides a good approximation of this distribution. Through simulations, we show that this estimated distribution also leads to valid p-values that can be used for high-dimensional inference. We then apply our approach to a pathway-based analysis of the association between DNA methylation and disease type in patients with systemic auto-immune rheumatic diseases.

READ FULL TEXT

page 18

page 20

page 21

research
07/30/2020

Impulse Response Analysis for Sparse High-Dimensional Time Series

We consider structural impulse response analysis for sparse high-dimensi...
research
01/10/2023

High Dimensional Analysis of Variance in Multivariate Linear Regression

In this paper, we develop a systematic theory for high dimensional analy...
research
12/28/2020

Rao's Score Tests on Correlation Matrices

Even though the Rao's score tests are classical tests, such as the likel...
research
02/01/2016

Fast inference of ill-posed problems within a convex space

In multiple scientific and technological applications we face the proble...
research
10/10/2017

Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data

Background. Emerging technologies now allow for mass spectrometry based ...
research
08/30/2019

Discovering Reliable Correlations in Categorical Data

In many scientific tasks we are interested in discovering whether there ...

Please sign up or login with your details

Forgot password? Click here to reset