ABC Variable Selection with Bayesian Forests

06/06/2018
by   YI LIU, et al.
0

Few problems in statistics are as perplexing as variable selection in the presence of very many redundant covariates. The variable selection problem is most familiar in parametric environments such as the linear model or additive variants thereof. In this work, we abandon the linear model framework, which can be quite detrimental when the covariates impact the outcome in a non-linear way, and turn to tree-based methods for variable selection. Such variable screening is traditionally done by pruning down large trees or by ranking variables based on some importance measure. Despite heavily used in practice, these ad-hoc selection rules are not yet well understood from a theoretical point of view. In this work, we devise a Bayesian tree-based probabilistic method and show that it is consistent for variable selection when the regression surface is a smooth mix of p>n covariates. These results are the first model selection consistency results for Bayesian forest priors. Probabilistic assessment of variable importance is made feasible by a spike-and-slab wrapper around sum- of-trees priors. Sampling from posterior distributions over trees is inherently very difficult. As an alternative to MCMC, we propose ABC Bayesian Forests, a new ABC sampling method based on data-splitting that achieves higher ABC acceptance rate while retaining probabilistic coherence. We show that the method is robust and successful at finding variables with high marginal inclusion probabilities. Our ABC algorithm provides a new avenue towards approximating the median probability model in non-parametric setups where the marginal likelihood is intractable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2018

Comparing Spike and Slab Priors for Bayesian Variable Selection

An important task in building regression models is to decide which regre...
research
11/05/2020

Nonparametric Variable Screening with Optimal Decision Stumps

Decision trees and their ensembles are endowed with a rich set of diagno...
research
06/19/2008

BART: Bayesian additive regression trees

We develop a Bayesian "sum-of-trees" model where each tree is constraine...
research
07/31/2019

Additive Bayesian variable selection under censoring and misspecification

We study the effect and interplay of two important issues on Bayesian mo...
research
07/10/2022

Energy Trees: Regression and Classification With Structured and Mixed-Type Covariates

The continuous growth of data complexity requires methods and models tha...
research
09/06/2021

Screening the Discrepancy Function of a Computer Model

Screening traditionally refers to the problem of detecting active inputs...
research
07/01/2020

Variable Selection via Thompson Sampling

Thompson sampling is a heuristic algorithm for the multi-armed bandit pr...

Please sign up or login with your details

Forgot password? Click here to reset