Generalised Boosted Forests

02/24/2021
by   Indrayudh Ghosal, et al.
15

This paper extends recent work on boosting random forests to model non-Gaussian responses. Given an exponential family 𝔼[Y|X] = g^-1(f(X)) our goal is to obtain an estimate for f. We start with an MLE-type estimate in the link space and then define generalised residuals from it. We use these residuals and some corresponding weights to fit a base random forest and then repeat the same to obtain a boost random forest. We call the sum of these three estimators a generalised boosted forest. We show with simulated and real data that both the random forest steps reduces test-set log-likelihood, which we treat as our primary metric. We also provide a variance estimator, which we can obtain with the same computational cost as the original estimate itself. Empirical experiments on real-world data and simulations demonstrate that the methods can effectively reduce bias, and that confidence interval coverage is conservative in the bulk of the covariate distribution.

READ FULL TEXT

page 21

page 25

page 30

page 32

page 33

page 37

page 38

page 39

research
03/21/2018

Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate

In this paper we propose using the principle of boosting to reduce the b...
research
05/02/2014

Asymptotic Theory for Random Forests

Random forests have proven to be reliable predictive algorithms in many ...
research
12/11/2021

Confidence intervals for the random forest generalization error

We show that underneath the training process of a random forest there li...
research
09/13/2020

Random boosting and random^2 forests – A random tree depth injection approach

The induction of additional randomness in parallel and sequential ensemb...
research
04/22/2022

Analysing Opportunity Cost of Care Work using Mixed Effects Random Forests under Aggregated Census Data

Reliable estimators of the spatial distribution of socio-economic indica...
research
03/08/2021

Forest Guided Smoothing

We use the output of a random forest to define a family of local smoothe...
research
11/09/2020

An application of an Embedded Model Estimator to a synthetic non-stationary reservoir model with multiple secondary variables

A method (Ember) for non-stationary spatial modelling with multiple seco...

Please sign up or login with your details

Forgot password? Click here to reset