Factor analysis in high dimensional biological data with dependent observations

09/23/2020
by   Chris McKennan, et al.
0

Factor analysis is a critical component of high dimensional biological data analysis. However, modern biological data contain two key features that irrevocably corrupt existing methods. First, these data, which include longitudinal, multi-treatment and multi-tissue data, contain samples that break critical independence requirements necessary for the utilization of prevailing methods. Second, biological data contain factors with large, moderate and small signal strengths, and therefore violate the ubiquitous "pervasive factor" assumption essential to the performance of many methods. In this work, I develop a novel statistical framework to perform factor analysis and interpret its results in data with dependent observations and factors whose signal strengths span several orders of magnitude. I then prove that my methodology can be used to solve many important and previously unsolved problems that routinely arise when analyzing dependent biological data, including high dimensional covariance estimation, subspace recovery, latent factor interpretation and data denoising. Additionally, I show that my estimator for the number of factors overcomes both the notorious "eigenvalue shadowing" problem, as well as the biases due to the pervasive factor assumption that plague existing estimators. Simulated and real data demonstrate the superior performance of my methodology in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2018

Estimating and accounting for unobserved covariates in high dimensional correlated data

Many high dimensional and high-throughput biological datasets have compl...
research
02/03/2022

Statistical inference in factor analysis for diffusion processes from discrete observations

We consider statistical inference in factor analysis for ergodic and non...
research
09/02/2023

Robust estimation for number of factors in high dimensional factor modeling via Spearman correlation matrix

Determining the number of factors in high-dimensional factor modeling is...
research
11/03/2020

Nonparametric Estimation of Functional Dynamic Factor Model

For many phenomena, data are collected on a large scale, resulting in hi...
research
04/16/2019

Helping Effects Against Curse of Dimensionality in Threshold Factor Models for Matrix Time Series

As is known, factor analysis is a popular method to reduce dimension for...
research
03/06/2022

Estimation of a Factor-Augmented Linear Model with Applications Using Student Achievement Data

In many longitudinal settings, economic theory does not guide practition...

Please sign up or login with your details

Forgot password? Click here to reset