Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data

11/01/2017
by   Kyu Ha Lee, et al.
0

Microorganisms play critical roles in human health and disease. It is well known that microbes live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates, multivariate statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa, yielding more powerful tests of exposure or treatment effects than application of taxon-specific univariate analyses. In addition, the analysis of microbial count data requires special attention because data commonly exhibit zero inflation. To meet these needs, we developed a Bayesian variable selection model for multivariate count data with excess zeros that incorporates information on the covariance structure of the outcomes (counts for multiple taxa), while estimating associations with the mean levels of these outcomes. Although there has been a great deal of effort in zero-inflated models for longitudinal data, little attention has been given to high-dimensional multivariate zero-inflated data modeled via a general correlation structure. Through simulation, we compared performance of the proposed method to that of existing univariate approaches, for both the binary and count parts of the model. When outcomes were correlated the proposed variable selection method maintained type I error while boosting the ability to identify true associations in the binary component of the model. For the count part of the model, in some scenarios the the univariate method had higher power than the multivariate approach. This higher power was at a cost of a highly inflated false discovery rate not observed with the proposed multivariate method. We applied the approach to oral microbiome data from the Pediatric HIV/AIDS Cohort Oral Health Study and identified five species (of 44) associated with HIV infection.

READ FULL TEXT

page 1

page 18

page 20

research
07/18/2022

Boosting Multivariate Structured Additive Distributional Regression Models

We develop a model-based boosting approach for multivariate distribution...
research
10/10/2017

Quantitative Comparison of Statistical Methods for Analyzing Human Metabolomics Data

Background. Emerging technologies now allow for mass spectrometry based ...
research
03/22/2022

Bayesian outcome selection modelling

Psychiatric and social epidemiology often involves assessing the effects...
research
10/08/2019

On the feasibility of parsimonious variable selection for Hotelling's T^2-test

Hotelling's T^2-test for the mean of a multivariate normal distribution ...
research
10/17/2021

Building Degradation Index with Variable Selection for Multivariate Sensory Data

The modeling and analysis of degradation data have been an active resear...
research
03/29/2020

DCMD: Distance-based Classification Using Mixture Distributions on Microbiome Data

Current advances in next generation sequencing techniques have allowed r...
research
02/23/2023

A Bayesian Zero-Inflated Dirichlet-Multinomial Regression Model for Multivariate Compositional Count Data

The Dirichlet-multinomial (DM) distribution plays a fundamental role in ...

Please sign up or login with your details

Forgot password? Click here to reset