Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applications

03/04/2019
by   Patrick L. Combettes, et al.
0

Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions in their natural habitat and are often paired with covariate measurements that characterize physicochemical habitat properties or the physiology of the host. Inferring parsimonious statistical associations between microbial compositions and habitat- or host-specific covariate data is an important step in exploratory data analysis. A standard statistical model linking compositional covariates to continuous outcomes is the linear log-contrast model. This model describes the response as a linear combination of log-ratios of the original compositions and has been extended to the high-dimensional setting via regularization. In this contribution, we propose a general convex optimization model for linear log-contrast regression which includes many previous proposals as special cases. We introduce a proximal algorithm that solves the resulting constrained optimization problem exactly with rigorous convergence guarantees. We illustrate the versatility of our approach by investigating the performance of several model instances on soil and gut microbiome data analysis tasks.

READ FULL TEXT

page 8

page 9

research
09/11/2019

Robust Regression with Compositional Covariates

Many high-throughput sequencing data sets in biology are compositional i...
research
03/31/2023

Regression and Classification of Compositional Data via a novel Supervised Log Ratio Method

Compositional data in which only the relative abundances of variables ar...
research
12/21/2018

Primal path algorithm for compositional data analysis

Compositional data have two unique characteristics compared to typical m...
research
11/28/2018

High-dimensional Log-Error-in-Variable Regression with Applications to Microbial Compositional Data Analysis

In microbiome and genomic study, the regression of compositional data ha...
research
01/26/2023

On the choice of weights in aggregate compositional data analysis

In this paper, we distinguish between two kinds of compositional data se...
research
04/30/2020

A Bayesian model of microbiome data for simultaneous identification of covariate associations and prediction of phenotypic outcomes

One of the major research questions regarding human microbiome studies i...

Please sign up or login with your details

Forgot password? Click here to reset