Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

04/28/2015
by   Abraham J. Wyner, et al.
0

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which creates what we denote as a "spikey-smooth" classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

READ FULL TEXT

page 16

page 19

page 21

page 23

research
03/30/2021

Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

Due to their long-standing reputation as excellent off-the-shelf predict...
research
11/04/2020

Residual Likelihood Forests

This paper presents a novel ensemble learning approach called Residual L...
research
09/13/2020

Random boosting and random^2 forests – A random tree depth injection approach

The induction of additional randomness in parallel and sequential ensemb...
research
11/01/2019

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

Random forests remain among the most popular off-the-shelf supervised ma...
research
02/10/2023

Conceptual Views on Tree Ensemble Classifiers

Random Forests and related tree-based methods are popular for supervised...
research
05/27/2022

Adaptive Random Forests for Energy-Efficient Inference on Microcontrollers

Random Forests (RFs) are widely used Machine Learning models in low-powe...
research
05/13/2020

Phishing URL Detection Through Top-level Domain Analysis: A Descriptive Approach

Phishing is considered to be one of the most prevalent cyber-attacks bec...

Please sign up or login with your details

Forgot password? Click here to reset