Is interpolation benign for random forests?

02/08/2022
by   Ludovic Arnould, et al.
0

Statistical wisdom suggests that very complex models, interpolating training data, will be poor at prediction on unseen examples. Yet, this aphorism has been recently challenged by the identification of benign overfitting regimes, specially studied in the case of parametric models: generalization capabilities may be preserved despite model high complexity. While it is widely known that fully-grown decision trees interpolate and, in turn, have bad predictive performances, the same behavior is yet to be analyzed for random forests. In this paper, we study the trade-off between interpolation and consistency for several types of random forest algorithms. Theoretically, we prove that interpolation regimes and consistency cannot be achieved for non-adaptive random forests. Since adaptivity seems to be the cornerstone to bring together interpolation and consistency, we introduce and study interpolating Adaptive Centered Forests, which are proved to be consistent in a noiseless scenario. Numerical experiments show that Breiman's random forests are consistent while exactly interpolating, when no bootstrap step is involved. We theoretically control the size of the interpolation area, which converges fast enough to zero, so that exact interpolation and consistency occur in conjunction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2019

Multinomial Random Forests: Fill the Gap between Theoretical Consistency and Empirical Soundness

Random forests (RF) are one of the most widely used ensemble learning me...
research
10/04/2013

Narrowing the Gap: Random Forests In Theory and In Practice

Despite widespread interest and practical use, the theoretical propertie...
research
11/23/2022

Consistency of The Oblique Decision Tree and Its Random Forest

The classification and regression tree (CART) and Random Forest (RF) are...
research
03/30/2021

Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

Due to their long-standing reputation as excellent off-the-shelf predict...
research
06/13/2018

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

Many modern machine learning models are trained to achieve zero or near-...
research
08/19/2019

SIRUS: making random forests interpretable

State-of-the-art learning algorithms, such as random forests or neural n...
research
04/29/2020

Interpretable Random Forests via Rule Extraction

We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a...

Please sign up or login with your details

Forgot password? Click here to reset