Compressed spectral screening for large-scale differential correlation analysis with application in selecting Glioblastoma gene modules

11/05/2021
by   Tianxi Li, et al.
0

Differential co-expression analysis has been widely applied by scientists in understanding the biological mechanisms of diseases. However, the unknown differential patterns are often complicated; thus, models based on simplified parametric assumptions can be ineffective in identifying the differences. Meanwhile, the gene expression data involved in such analysis are in extremely high dimensions by nature, whose correlation matrices may not even be computable. Such a large scale seriously limits the application of most well-studied statistical methods. This paper introduces a simple yet powerful approach to the differential correlation analysis problem called compressed spectral screening. By leveraging spectral structures and random sampling techniques, our approach could achieve a highly accurate screening of features with complicated differential patterns while maintaining the scalability to analyze correlation matrices of 10^4–10^5 variables within a few minutes on a standard personal computer. We have applied this screening approach in comparing a TCGA data set about Glioblastoma with normal subjects. Our analysis successfully identifies multiple functional modules of genes that exhibit different co-expression patterns. The findings reveal new insights about Glioblastoma's evolving mechanism. The validity of our approach is also justified by a theoretical analysis, showing that the compressed spectral analysis can achieve variable screening consistency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2011

Large Scale Correlation Screening

This paper treats the problem of screening for variables with high corre...
research
08/26/2018

Doubly Robust Sure Screening for Elliptical Copula Regression Model

Regression analysis has always been a hot research topic in statistics. ...
research
04/20/2021

Screening methods for linear errors-in-variables models in high dimensions

Microarray studies, in order to identify genes associated with an outcom...
research
02/24/2018

Correlating Cellular Features with Gene Expression using CCA

To understand the biology of cancer, joint analysis of multiple data mod...
research
03/10/2013

Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension

We introduce a new approach to variable selection, called Predictive Cor...
research
02/02/2018

CoDiNA: an RPackage for Co-expression Differential Network Analysis in n Dimensions

Biological and Medical science is increasingly acknowledging the use of ...
research
05/29/2020

CLARITY – Comparing heterogeneous data using dissimiLARITY

Integrating datasets from different disciplines is hard because the data...

Please sign up or login with your details

Forgot password? Click here to reset