To tune or not to tune the number of trees in random forest?

05/16/2017
by   Philipp Probst, et al.
0

The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2018

Complete Analysis of a Random Forest Model

Random forests have become an important tool for improving accuracy in r...
research
04/29/2020

Asymptotic Properties of High-Dimensional Random Forests

As a flexible nonparametric learning tool, random forest has been widely...
research
05/25/2019

Asymptotic Distributions and Rates of Convergence for Random Forests and other Resampled Ensemble Learners

Random forests remain among the most popular off-the-shelf supervised le...
research
08/31/2016

hi-RF: Incremental Learning Random Forest for large-scale multi-class Data Classification

In recent years, dynamically growing data and incrementally growing numb...
research
04/17/2019

Predict Future Sales using Ensembled Random Forests

This is a method report for the Kaggle data competition 'Predict future ...
research
04/10/2018

Hyperparameters and Tuning Strategies for Random Forest

The random forest algorithm (RF) has several hyperparameters that have t...
research
11/18/2015

A Random Forest Guided Tour

The random forest algorithm, proposed by L. Breiman in 2001, has been ex...

Please sign up or login with your details

Forgot password? Click here to reset