Flexible clustering via hidden hierarchical Dirichlet priors

01/18/2022
by   Antonio Lijoi, et al.
0

The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering method for both samples and observations, we investigate a nonparametric prior that arises as the composition of two different discrete random structures and derive a closed-form expression for the induced distribution of the random partition, the fundamental tool regulating the clustering behavior of the model. On the one hand, this allows to gain a deeper insight into the theoretical properties of the model and, on the other hand, it yields an MCMC algorithm for evaluating Bayesian inferences of interest. Moreover, we single out limitations of this algorithm when working with more than two populations and, consequently, devise an alternative more efficient sampling scheme, which as a by-product, allows testing homogeneity between different populations. Finally, we perform a comparison with the nested Dirichlet process and provide illustrative examples of both synthetic and real data.

READ FULL TEXT

page 8

page 16

page 28

page 30

research
01/15/2018

Latent nested nonparametric priors

Discrete random structures are important tools in Bayesian nonparametric...
research
05/20/2020

The semi-hierarchical Dirichlet Process and its application to clustering homogeneous distributions

Assessing homogeneity of distributions is an old problem that has receiv...
research
02/27/2023

Detecting Jumps on a Tree: a Hierarchical Pitman-Yor Model for Evolution of Phenotypic Distributions

This work focuses on clustering populations with a hierarchical dependen...
research
08/17/2020

A Common Atom Model for the Bayesian Nonparametric Analysis of Nested Data

The use of high-dimensional data for targeted therapeutic interventions ...
research
08/03/2023

Similarity-based Random Partition Distribution for Clustering Functional Data

Random partitioned distribution is a powerful tool for model-based clust...
research
01/04/2010

Inference of global clusters from locally distributed data

We consider the problem of analyzing the heterogeneity of clustering dis...
research
03/02/2019

Kullback-Leibler Divergence for Bayesian Nonparametric Model Checking

Bayesian nonparametric statistics is an area of considerable research in...

Please sign up or login with your details

Forgot password? Click here to reset