A Mixing Time Lower Bound for a Simplified Version of BART

10/17/2022
by   Omer Ronen, et al.
2

Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression algorithm. The posterior is a distribution over sums of decision trees, and predictions are made by averaging approximate samples from the posterior. The combination of strong predictive performance and the ability to provide uncertainty measures has led BART to be commonly used in the social sciences, biostatistics, and causal inference. BART uses Markov Chain Monte Carlo (MCMC) to obtain approximate posterior samples over a parameterized space of sums of trees, but it has often been observed that the chains are slow to mix. In this paper, we provide the first lower bound on the mixing time for a simplified version of BART in which we reduce the sum to a single tree and use a subset of the possible moves for the MCMC proposal distribution. Our lower bound for the mixing time grows exponentially with the number of data points. Inspired by this new connection between the mixing time and the number of data points, we perform rigorous simulations on BART. We show qualitatively that BART's mixing time increases with the number of data points. The slow mixing time of the simplified BART suggests a large variation between different runs of the simplified BART algorithm and a similar large variation is known for BART in the literature. This large variation could result in a lack of stability in the models, predictions, and posterior intervals obtained from the BART MCMC samples. Our lower bound and simulations suggest increasing the number of chains with the number of data points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2019

Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models

Decision trees are flexible models that are well suited for many statist...
research
05/31/2023

On Mixing Rates for Bayesian CART

The success of Bayesian inference with MCMC depends critically on Markov...
research
09/16/2021

How trustworthy is your tree? Bayesian phylogenetic effective sample size through the lens of Monte Carlo error

Bayesian inference is a popular and widely-used approach to infer phylog...
research
04/03/2023

Improved Bound for Mixing Time of Parallel Tempering

In the field of sampling algorithms, MCMC (Markov Chain Monte Carlo) met...
research
03/17/2020

R^*: A robust MCMC convergence diagnostic with uncertainty using gradient-boosted machines

Markov chain Monte Carlo (MCMC) has transformed Bayesian model inference...
research
06/14/2018

On the ranking of Test match batsmen

Ranking sportsmen whose careers took place in different eras is often a ...
research
06/04/2021

Statistical summaries of unlabelled evolutionary trees and ranked hierarchical clustering trees

Rooted and ranked binary trees are mathematical objects of great importa...

Please sign up or login with your details

Forgot password? Click here to reset