Hidden Markov Pólya trees for high-dimensional distributions

11/05/2020
by   Naoki Awaya, et al.
0

The Pólya tree (PT) process is a general-purpose Bayesian nonparametric model that has found wide application in a range of inference problems. The PT has a simple analytic form and the resulting posterior computation boils down to straight-forward beta-binomial conjugate updates along a partition tree over the sample space. Recent development in PT models shows that performance of these models can be substantially improved by (i) incorporating latent state variables that characterize local features of the underlying distributions and (ii) allowing the partition tree to adapt to the structure of the underlying distribution. Despite these advances, however, some important limitations of the PT that remain include—(i) the sensitivity in the posterior inference with respect to the choice of the partition points, and (ii) the lack of computational scalability to multivariate problems beyond a small number (<10) of dimensions. We consider a modeling strategy for PT models that incorporates a very flexible prior on the partition tree along with latent states that can be first-order dependent (i.e., following a Markov process), and introduce a hybrid algorithm that combines sequential Monte Carlo (SMC) and recursive message passing for posterior inference that can readily accommodate PT models with or without latent states as well as flexible partition points in problems up to 100 dimensions. Moreover, we investigate the large sample properties of the tree structures and latent states under the posterior model. We carry out extensive numerical experiments in the context of density estimation and two-sample testing, which show that flexible partitioning can substantially improve the performance of PT models in both inference tasks. We demonstrate an application to a flow cytometry data set with 19 dimensions and over 200,000 observations.

READ FULL TEXT

page 26

page 33

research
07/07/2022

A Bayesian Survival Tree Partition Model Using Latent Gaussian Processes

Survival models are used to analyze time-to-event data in a variety of d...
research
05/30/2018

Bayesian Nonparametric Higher Order Hidden Markov Models

We consider the problem of flexible modeling of higher order hidden Mark...
research
01/26/2021

Tree boosting for learning probability measures

Learning probability measures based on an i.i.d. sample is a fundamental...
research
11/02/2017

Partition mixture of 1D wavelets for multi-dimensional data

Traditional statistical wavelet analysis that carries out modeling and i...
research
06/13/2022

Density Estimation with Autoregressive Bayesian Predictives

Bayesian methods are a popular choice for statistical inference in small...
research
03/22/2019

Binary Space Partitioning Forests

The Binary Space Partitioning (BSP)-Tree process is proposed to produce ...
research
02/22/2020

Bayesian Multi-scale Modeling of Factor Matrix without using Partition Tree

The multi-scale factor models are particularly appealing for analyzing m...

Please sign up or login with your details

Forgot password? Click here to reset