There is no Double-Descent in Random Forests

11/08/2021
by   Sebastian Buschjäger, et al.
23

Random Forests (RFs) are among the state-of-the-art in machine learning and offer excellent performance with nearly zero parameter tuning. Remarkably, RFs seem to be impervious to overfitting even though their basic building blocks are well-known to overfit. Recently, a broadly received study argued that a RF exhibits a so-called double-descent curve: First, the model overfits the data in a u-shaped curve and then, once a certain model complexity is reached, it suddenly improves its performance again. In this paper, we challenge the notion that model capacity is the correct tool to explain the success of RF and argue that the algorithm which trains the model plays a more important role than previously thought. We show that a RF does not exhibit a double-descent curve but rather has a single descent. Hence, it does not overfit in the classic sense. We further present a RF variation that also does not overfit although its decision boundary approximates that of an overfitted DT. Similar, we show that a DT which approximates the decision boundary of a RF will still overfit. Last, we study the diversity of an ensemble as a tool the estimate its performance. To do so, we introduce Negative Correlation Forest (NCForest) which allows for precise control over the diversity in the ensemble. We show, that the diversity and the bias indeed have a crucial impact on the performance of the RF. Having too low diversity collapses the performance of the RF into a a single tree, whereas having too much diversity means that most trees do not produce correct outputs anymore. However, in-between these two extremes we find a large range of different trade-offs with all roughly equal performance. Hence, the specific trade-off between bias and diversity does not matter as long as the algorithm reaches this good trade-off regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2020

Benign Overfitting and Noisy Features

Modern machine learning often operates in the regime where the number of...
research
10/19/2021

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement

Random Forests (RF) are among the state-of-the-art in many machine learn...
research
10/15/2019

Breadth-first, Depth-next Training of Random Forests

In this paper we analyze, evaluate, and improve the performance of train...
research
04/06/2022

Double Descent in Random Feature Models: Precise Asymptotic Analysis for General Convex Regularization

We prove rigorous results on the double descent phenomenon in random fea...
research
06/10/2015

Randomer Forests

Random forests (RF) is a popular general purpose classifier that has bee...
research
03/02/2021

Slow-Growing Trees

Random Forest's performance can be matched by a single slow-growing tree...
research
06/29/2020

Random Partitioning Forest for Point-Wise and Collective Anomaly Detection – Application to Intrusion Detection

In this paper, we propose DiFF-RF, an ensemble approach composed of rand...

Please sign up or login with your details

Forgot password? Click here to reset