Tree Ensembles with Rule Structured Horseshoe Regularization

02/16/2017
by   Malte Nalenz, et al.
0

We propose a new Bayesian model for flexible nonlinear regression and classification using tree ensembles. The model is based on the RuleFit approach in Friedman and Popescu (2008) where rules from decision trees and linear terms are used in a L1-regularized regression. We modify RuleFit by replacing the L1-regularization by a horseshoe prior, which is well known to give aggressive shrinkage of noise predictor while leaving the important signal essentially untouched. This is especially important when a large number of rules are used as predictors as many of them only contribute noise. Our horseshoe prior has an additional hierarchical layer that applies more shrinkage a priori to rules with a large number of splits, and to rules that are only satisfied by a few observations. The aggressive noise shrinkage of our prior also makes it possible to complement the rules from boosting in Friedman and Popescu (2008) with an additional set of trees from random forest, which brings a desirable diversity to the ensemble. We sample from the posterior distribution using a very efficient and easily implemented Gibbs sampler. The new model is shown to outperform state-of-the-art methods like RuleFit, BART and random forest on 16 datasets. The model and its interpretation is demonstrated on the well known Boston housing data, and on gene expression data for cancer classification. The posterior sampling, prediction and graphical tools for interpreting the model results are implemented in a publicly available R package.

READ FULL TEXT

page 10

page 16

research
07/13/2020

Rule Covering for Interpretation and Boosting

We propose two algorithms for interpretation and boosting of tree-based ...
research
07/22/2017

pre: An R Package for Fitting Prediction Rule Ensembles

Prediction rule ensembles (PREs) are sparse collections of rules, offeri...
research
03/29/2022

Explaining random forest prediction through diverse rulesets

Tree-ensemble algorithms, such as random forest, are effective machine l...
research
09/16/2017

Relevant Ensemble of Trees

Tree ensembles are flexible predictive models that can capture relevant ...
research
07/02/2013

Comparing various regression methods on ensemble strategies in differential evolution

Differential evolution possesses a multitude of various strategies for g...
research
11/08/2022

A new BART prior for flexible modeling with categorical predictors

Default implementations of Bayesian Additive Regression Trees (BART) rep...
research
01/14/2020

Interpretation and Simplification of Deep Forest

This paper proposes a new method for interpreting and simplifying a blac...

Please sign up or login with your details

Forgot password? Click here to reset