A Robust Hypothesis Test for Tree Ensemble Pruning

01/24/2023
by   Daniel de Marchi, et al.
0

Gradient boosted decision trees are some of the most popular algorithms in applied machine learning. They are a flexible and powerful tool that can robustly fit to any tabular dataset in a scalable and computationally efficient way. One of the most critical parameters to tune when fitting these models are the various penalty terms used to distinguish signal from noise in the current model. These penalties are effective in practice, but are lacking in robust theoretical justifications. In this paper we develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles and demonstrate that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss. Additionally, this method provides a theoretically well-justified stopping condition for the tree growing algorithm. We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.

READ FULL TEXT
research
01/05/2016

Optimally Pruning Decision Tree Ensembles With Feature Cost

We consider the problem of learning decision rules for prediction with f...
research
05/31/2022

ForestPrune: Compact Depth-Controlled Tree Ensembles

Tree ensembles are versatile supervised learning algorithms that achieve...
research
08/19/2021

Simple is better: Making Decision Trees faster using random sampling

In recent years, gradient boosted decision trees have become popular in ...
research
10/20/2020

An Eager Splitting Strategy for Online Decision Trees

We study the effectiveness of replacing the split strategy for the state...
research
11/17/2012

Cost-sensitive C4.5 with post-pruning and competition

Decision tree is an effective classification approach in data mining and...
research
02/19/2018

Finding Influential Training Samples for Gradient Boosted Decision Trees

We address the problem of finding influential training samples for a par...

Please sign up or login with your details

Forgot password? Click here to reset