DeepAI AI Chat
Log In Sign Up

On Variance Estimation of Random Forests

02/18/2022
by   Tianning Xu, et al.
University of Illinois at Urbana-Champaign
0

Ensemble methods, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and a more accurate coverage rate without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.

READ FULL TEXT
04/25/2014

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for machine l...
04/26/2022

Confidence Band Estimation for Survival Random Forests

Survival random forest is a popular machine learning tool for modeling c...
01/04/2019

Approximating high-dimensional infinite-order U-statistics: statistical and computational guarantees

We study the problem of distributional approximations to high-dimensiona...
08/02/2022

The nonparametric Behrens-Fisher problem in small samples

While there appears to be a general consensus in the literature on the d...
12/02/2019

Asymptotic Normality and Variance Estimation For Supervised Ensembles

Ensemble methods based on bootstrapping have improved the predictive acc...
02/28/2021

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation ap...