Bayesian clustering of multiple zero-inflated outcomes

05/10/2022
by   Beatrice Franzolini, et al.
0

Several applications involving counts present a large proportion of zeros (excess-of-zeros data). A popular model for such data is the Hurdle model, which explicitly models the probability of a zero count, while assuming a sampling distribution on the positive integers. We consider data from multiple count processes. In this context, it is of interest to study the patterns of counts and cluster the subjects accordingly. We introduce a novel Bayesian nonparametric approach to cluster multiple, possibly related, zero-inflated processes. We propose a joint model for zero-inflated counts, specifying a Hurdle model for each process with a shifted Negative Binomial sampling distribution. Conditionally on the model parameters, the different processes are assumed independent, leading to a substantial reduction in the number of parameters as compared to traditional multivariate approaches. The subject-specific probabilities of zero-inflation and the parameters of the sampling distribution are flexibly modelled via an enriched finite mixture with random number of components. This induces a two-level clustering of the subjects based on the zero/non-zero patterns (outer clustering) and on the sampling distribution (inner clustering). Posterior inference is performed through tailored MCMC schemes. We demonstrate the proposed approach on an application involving the use of the messaging service WhatsApp.

READ FULL TEXT

page 12

page 13

research
06/19/2021

Robust Hierarchical Modeling of Counts under Zero-inflation and Outliers

Count data with zero inflation and large outliers are ubiquitous in many...
research
04/28/2023

PAM: Plaid Atoms Model for Bayesian Nonparametric Analysis of Grouped Data

We consider dependent clustering of observations in groups. The proposed...
research
09/21/2020

Sample Size Calculation for Cluster Randomized Trials with Zero-inflated Count Outcomes

Cluster randomized trails (CRT) have been widely employed in medical and...
research
01/03/2019

Nonparametric graphical model for counts

Although multivariate count data are routinely collected in many applica...
research
08/01/2019

Bayesian Gamma-Negative Binomial Modeling of Single-Cell RNA Sequencing Data

Background: Single-cell RNA sequencing (scRNA-seq) is a powerful profili...
research
07/23/2020

Anti-clustering in the national SARS-CoV-2 daily infection counts

The noise in daily infection counts of an epidemic should be super-Poiss...
research
01/18/2018

Variance Components Genetic Association Test for Zero-inflated Count Outcomes

Commonly in biomedical research, studies collect data in which an outcom...

Please sign up or login with your details

Forgot password? Click here to reset