DeepAI AI Chat
Log In Sign Up

On Variance Estimation of Random Forests

by   Tianning Xu, et al.
University of Illinois at Urbana-Champaign

Ensemble methods, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and a more accurate coverage rate without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.


Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for machine l...

Confidence Band Estimation for Survival Random Forests

Survival random forest is a popular machine learning tool for modeling c...

Approximating high-dimensional infinite-order U-statistics: statistical and computational guarantees

We study the problem of distributional approximations to high-dimensiona...

The nonparametric Behrens-Fisher problem in small samples

While there appears to be a general consensus in the literature on the d...

Asymptotic Normality and Variance Estimation For Supervised Ensembles

Ensemble methods based on bootstrapping have improved the predictive acc...

On the Subbagging Estimation for Massive Data

This article introduces subbagging (subsample aggregating) estimation ap...