Estimating the Algorithmic Variance of Randomized Ensembles via the Bootstrap

07/20/2019
by   Miles E. Lopes, et al.
0

Although the methods of bagging and random forests are some of the most widely used prediction methods, relatively little is known about their algorithmic convergence. In particular, there are not many theoretical guarantees for deciding when an ensemble is "large enough" --- so that its accuracy is close to that of an ideal infinite ensemble. Due to the fact that bagging and random forests are randomized algorithms, the choice of ensemble size is closely related to the notion of "algorithmic variance" (i.e. the variance of prediction error due only to the training algorithm). In the present work, we propose a bootstrap method to estimate this variance for bagging, random forests, and related methods in the context of classification. To be specific, suppose the training dataset is fixed, and let the random variable Err_t denote the prediction error of a randomized ensemble of size t. Working under a "first-order model" for randomized ensembles, we prove that the centered law of Err_t can be consistently approximated via the proposed method as t→∞. Meanwhile, the computational cost of the method is quite modest, by virtue of an extrapolation technique. As a consequence, the method offers a practical guideline for deciding when the algorithmic fluctuations of Err_t are negligible.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2019

Measuring the Algorithmic Convergence of Randomized Ensembles: The Regression Setting

When randomized ensemble methods such as bagging and random forests are ...
research
03/04/2013

A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles

When random forests are used for binary classification, an ensemble of t...
research
06/01/2015

Bootstrap Bias Corrections for Ensemble Methods

This paper examines the use of a residual bootstrap for bias correction ...
research
08/31/2022

The Infinitesimal Jackknife and Combinations of Models

The Infinitesimal Jackknife is a general method for estimating variances...
research
10/24/2017

Estimating the Operating Characteristics of Ensemble Methods

In this paper we present a technique for using the bootstrap to estimate...
research
10/08/2020

Prediction intervals for Deep Neural Networks

The aim of this paper is to propose a suitable method for constructing p...
research
02/10/2023

Conceptual Views on Tree Ensemble Classifiers

Random Forests and related tree-based methods are popular for supervised...

Please sign up or login with your details

Forgot password? Click here to reset