Discovering Relationships and their Structures Across Disparate Data Modalities

09/16/2016
by   Cencheng Shen, et al.
0

Determining whether certain properties are related to other properties is fundamental to scientific discovery. As data collection rates accelerate, it is becoming increasingly difficult yet ever more important to determine whether one property of data (e.g., cloud density) is related to another (e.g., grass wetness). Only if two properties are related are further investigations into the geometry of the relationship warranted. While existing approaches can test whether two properties are related, they may require unfeasibly large sample sizes in real data scenarios, and do not provide insight into the geometry underlying the structure of the relationship. We juxtapose hypothesis testing, manifold learning, and harmonic analysis to obtain Multiscale Generalized Correlation (MGC). Our key insight is that one can adaptively restrict the analysis to the "jointly local" observations - that is, one can estimate the scale with the most informative neighbors for determining the existence and geometry of a relationship. We prove that to achieve a given true positive rate, MGC typically requires far fewer samples than existing methods for all investigated dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC uniquely provides a simple and elegant characterization of the potentially complex latent geometry underlying the relationship. We used MGC to detect the presence and reveal the geometry of the relationships between mental and brain properties, to perform a proteomics screening, and to develop an imaging biomarker for disease, while avoiding the false positive inflation problems that have plagued conventional parametric approaches. Our open source implementation of MGC is easy to use, parameter-free, and applicable to previously vexing statistical questions that are ubiquitous in science, government, finance, and other disciplines.

READ FULL TEXT

page 4

page 7

page 40

research
05/30/2023

Identifying the Complete Correlation Structure in Large-Scale High-Dimensional Data Sets with Local False Discovery Rates

The identification of the dependent components in multiple data sets is ...
research
03/07/2023

Statistical inferences for complex dependence of multimodal imaging data

Statistical analysis of multimodal imaging data is a challenging task, s...
research
10/16/2019

Identifying relationships between cognitive processes across tasks, contexts, and time

It is commonly assumed that a specific testing occasion (task, design, p...
research
11/07/2019

Improving Power of 2-Sample Random Graph Tests with Applications in Connectomics

In many applications, there is an interest in testing whether two graphs...
research
12/17/2022

Inference with approximate local false discovery rates

Efron's two-group model is widely used in large scale multiple testing. ...
research
09/14/2015

Geometry and dimensionality reduction of feature spaces in primary visual cortex

Some geometric properties of the wavelet analysis performed by visual ne...
research
09/14/2018

Learning to Fingerprint the Latent Structure in Question Articulation

Abstract Machine understanding of questions is tightly related to recogn...

Please sign up or login with your details

Forgot password? Click here to reset