Identifying the Complete Correlation Structure in Large-Scale High-Dimensional Data Sets with Local False Discovery Rates

05/30/2023
by   Martin Gölz, et al.
0

The identification of the dependent components in multiple data sets is a fundamental problem in many practical applications. The challenge in these applications is that often the data sets are high-dimensional with few observations or available samples and contain latent components with unknown probability distributions. A novel mathematical formulation of this problem is proposed, which enables the inference of the underlying correlation structure with strict false positive control. In particular, the false discovery rate is controlled at a pre-defined threshold on two levels simultaneously. The deployed test statistics originate in the sample coherence matrix. The required probability models are learned from the data using the bootstrap. Local false discovery rates are used to solve the multiple hypothesis testing problem. Compared to the existing techniques in the literature, the developed technique does not assume an a priori correlation structure and work well when the number of data sets is large while the number of observations is small. In addition, it can handle the presence of distributional uncertainties, heavy-tailed noise, and outliers.

READ FULL TEXT

page 8

page 19

page 20

page 21

page 27

research
01/31/2019

Determining the Dimension and Structure of the Subspace Correlated Across Multiple Data Sets

Detecting the components common or correlated across multiple data sets ...
research
09/16/2016

Discovering Relationships and their Structures Across Disparate Data Modalities

Determining whether certain properties are related to other properties i...
research
08/27/2021

Multiple Hypothesis Testing Framework for Spatial Signals

The problem of identifying regions of spatially interesting, different o...
research
01/10/2013

Discovering Multiple Constraints that are Frequently Approximately Satisfied

Some high-dimensional data.sets can be modelled by assuming that there a...
research
09/14/2018

Feature-specific inference for penalized regression using local false discovery rates

Penalized regression methods, most notably the lasso, are a popular appr...
research
06/10/2020

A fundamental problem of hypothesis testing with finite inventory in e-commerce

In this paper, we draw attention to a problem that is often overlooked o...

Please sign up or login with your details

Forgot password? Click here to reset