Complete Analysis of a Random Forest Model

05/07/2018
by   Jason M. Klusowski, et al.
0

Random forests have become an important tool for improving accuracy in regression problems since their popularization by (Breiman, 2001) and others. In this paper, we revisit a random forest model originally proposed by (Breiman, 2004) and later studied by (Biau, 2012), where a feature is selected at random and the split occurs at the midpoint of the block containing the chosen feature. If the regression function is sparse and depends only on a small, unknown subset of S out of d features, we show that given n observations, this random forest model outputs a predictor that has a mean-squared prediction error of order (n√(^S-1 n))^-1/S2+1 . When S ≤ 0.72 d , this rate is better than the minimax optimal rate n^-2/d+2 for d -dimensional, Lipschitz function classes. As a consequence of our analysis, we show that the variance of the forest decays with the depth of the tree at a rate that is independent of the ambient dimension, even when the trees are fully grown. In particular, if ℓ_avg (resp. ℓ_max ) is the average (resp. maximum) number of observations per leaf node, we show that the variance of this forest is Θ(ℓ^-1_avg(√( n))^-(S-1)) , which for the case of S = d , is similar in form to the lower bound Ω(ℓ^-1_max( n)^-(d-1)) of (Lin and Jeon, 2006) for any random forest model with a nonadaptive splitting scheme. We also show that the bias is tight for any linear model with nonzero parameter vector. Thus, we completely characterize the fundamental limits of this random forest model. Our new analysis also implies that better theoretical performance can be achieved if the trees are grown less aggressively (i.e., grown to a shallower depth) than previous work would otherwise recommend.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary class...
research
12/29/2020

Random Planted Forest: a directly interpretable tree ensemble

We introduce a novel interpretable and tree-based algorithm for predicti...
research
05/16/2017

To tune or not to tune the number of trees in random forest?

The number of trees T in the random forest (RF) algorithm for supervised...
research
04/17/2019

Predict Future Sales using Ensembled Random Forests

This is a method report for the Kaggle data competition 'Predict future ...
research
11/09/2015

Spatially Coherent Random Forests

Spatially Coherent Random Forest (SCRF) extends Random Forest to create ...
research
11/08/2017

Universal consistency and minimax rates for online Mondrian Forests

We establish the consistency of an algorithm of Mondrian Forests, a rand...
research
06/04/2021

On Ensembling vs Merging: Least Squares and Random Forests under Covariate Shift

It has been postulated and observed in practice that for prediction prob...

Please sign up or login with your details

Forgot password? Click here to reset