A Bayesian Zero-Inflated Dirichlet-Multinomial Regression Model for Multivariate Compositional Count Data

02/23/2023
by   Matthew D. Koslovsky, et al.
0

The Dirichlet-multinomial (DM) distribution plays a fundamental role in modern statistical methodology development and application. Recently, the DM distribution and its variants have been used extensively to model multivariate count data generated by high-throughput sequencing technology in omics research due to its ability to accommodate the compositional structure of the data as well as overdispersion. A major limitation of the DM distribution is that it is unable to handle excess zeros typically found in practice which may bias inference. To fill this gap, we propose a novel Bayesian zero-inflated DM model for multivariate compositional count data with excess zeros. We then extend our approach to regression settings and embed sparsity-inducing priors to perform variable selection for high-dimensional covariate spaces. Throughout, modeling decisions are made to boost scalability without sacrificing interpretability or imposing limiting assumptions. Extensive simulations and an application to a human gut microbiome data set are presented to compare the performance of the proposed method to existing approaches. We provide an accompanying R package with a user-friendly vignette to apply our method to other data sets.

READ FULL TEXT
research
09/22/2022

A Bayesian Joint Model for Compositional Mediation Effect Selection in Microbiome Data

Analyzing multivariate count data generated by high-throughput sequencin...
research
11/08/2018

A New Count Regression Model including Gauss Hypergeometric Function with an application to model demand of health services

In this paper, an alternative count distribution suitable for modeling o...
research
05/17/2020

Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization

High-throughput sequencing technology provides unprecedented opportuniti...
research
07/01/2021

Dealing with overdispersion in multivariate count data

The problem of overdispersion in multivariate count data is a challengin...
research
11/01/2017

Bayesian Variable Selection for Multivariate Zero-Inflated Models: Application to Microbiome Count Data

Microorganisms play critical roles in human health and disease. It is we...
research
09/11/2019

Robust Regression with Compositional Covariates

Many high-throughput sequencing data sets in biology are compositional i...
research
08/27/2022

Modelling structural zeros in compositional data via a zero-censored multivariate normal model

We present a new model for analyzing compositional data with structural ...

Please sign up or login with your details

Forgot password? Click here to reset