To Bag is to Prune

08/17/2020
by   Philippe Goulet Coulombe, et al.
0

It is notoriously hard to build a bad Random Forest (RF). Concurrently, RF is perhaps the only standard ML algorithm that blatantly overfits in-sample without any consequence out-of-sample. Standard arguments cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a (latent) true underlying tree. More generally, there is no need to tune the stopping point of a properly randomized ensemble of greedily optimized base learners. Thus, Boosting and MARS are eligible. I empirically demonstrate the property with simulations and real data by reporting that these new ensembles yield equivalent performance to their tuned counterparts.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 18

page 22

page 27

09/16/2021

WildWood: a new Random Forest algorithm

We introduce WildWood (WW), a new ensemble algorithm for supervised lear...
04/19/2018

A Dynamic Boosted Ensemble Learning Based on Random Forest

We propose Dynamic Boosted Random Forest (DBRF), a novel ensemble algori...
04/19/2018

A Dynamic Boosted Ensemble Learning Method Based on Random Forest

We propose a dynamic boosted ensemble learning method based on random fo...
12/19/2020

(Decision and regression) tree ensemble based kernels for regression and classification

Tree based ensembles such as Breiman's random forest (RF) and Gradient B...
07/30/2020

Random Forests for dependent data

Random forest (RF) is one of the most popular methods for estimating reg...
03/02/2021

Slow-Growing Trees

Random Forest's performance can be matched by a single slow-growing tree...
06/16/2016

Pruning Random Forests for Prediction on a Budget

We propose to prune a random forest (RF) for resource-constrained predic...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.