High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis

11/28/2018
by   Pixu Shi, et al.
0

In microbiome and genomic study, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the uncertainty in covariates remain critical issues. In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. We also consider a general log-error-in-variable regression model and the corresponding method to accommodate broader situations. The merit of the procedure is illustrated through real data analysis and simulation studies.

READ FULL TEXT
research
03/31/2023

Regression and Classification of Compositional Data via a novel Supervised Log Ratio Method

Compositional data in which only the relative abundances of variables ar...
research
06/15/2021

Multi-sample estimation of centered log-ratio matrix in microbiome studies

In microbiome studies, one of the ways of studying bacterial abundances ...
research
03/04/2019

Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications

Compositional data sets are ubiquitous in science, including geology, ec...
research
09/11/2019

Robust Regression with Compositional Covariates

Many high-throughput sequencing data sets in biology are compositional i...
research
12/21/2018

Primal path algorithm for compositional data analysis

Compositional data have two unique characteristics compared to typical m...
research
01/21/2021

Robust Differential Abundance Test in Compositional Data

Differential abundance tests in the compositional data are essential and...
research
04/18/2019

Testing for differential abundance in compositional counts data, with application to microbiome studies

In order to identify which taxa differ in the microbiome community acros...

Please sign up or login with your details

Forgot password? Click here to reset