Cross-study analyses of microbial abundance using generalized common factor methods

03/27/2023
by   Molly G. Hayes, et al.
0

By creating networks of biochemical pathways, communities of micro-organisms are able to modulate the properties of their environment and even the metabolic processes within their hosts. Next-generation high-throughput sequencing has led to a new frontier in microbial ecology, promising the ability to leverage the microbiome to make crucial advancements in the environmental and biomedical sciences. However, this is challenging, as genomic data are high-dimensional, sparse, and noisy. Much of this noise reflects the exact conditions under which sequencing took place, and is so significant that it limits consensus-based validation of study results. We propose an ensemble approach for cross-study exploratory analyses of microbial abundance data in which we first estimate the variance-covariance matrix of the underlying abundances from each dataset on the log scale assuming Poisson sampling, and subsequently model these covariances jointly so as to find a shared low-dimensional subspace of the feature space. By viewing the projection of the latent true abundances onto this common structure, the variation is pared down to that which is shared among all datasets, and is likely to reflect more generalizable biological signal than can be inferred from individual datasets. We investigate several ways of achieving this, and demonstrate that they work well on simulated and real metagenomic data in terms of signal retention and interpretability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Bridging factor and sparse models

Factor and sparse models are two widely used methods to impose a low-dim...
research
08/05/2018

Hybrid Subspace Learning for High-Dimensional Data

The high-dimensional data setting, in which p >> n, is a challenging sta...
research
04/20/2020

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

Microbial communities analysis is drawing growing attention due to the r...
research
11/02/2018

RSVP-graphs: Fast High-dimensional Covariance Matrix Estimation under Latent Confounding

In this work we consider the problem of estimating a high-dimensional p ...
research
05/06/2023

Inferring Covariance Structure from Multiple Data Sources via Subspace Factor Analysis

Factor analysis provides a canonical framework for imposing lower-dimens...
research
09/21/2022

NashAE: Disentangling Representations through Adversarial Covariance Minimization

We present a self-supervised method to disentangle factors of variation ...
research
02/07/2022

Unsupervised physics-informed disentanglement of multimodal data for high-throughput scientific discovery

We introduce physics-informed multimodal autoencoders (PIMA) - a variati...

Please sign up or login with your details

Forgot password? Click here to reset