On Variance Estimation of Random Forests

02/18/2022
by   Tianning Xu, et al.
0

Ensemble methods, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and a more accurate coverage rate without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.

READ FULL TEXT
research
04/25/2014

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for machine l...
research
04/26/2022

Confidence Band Estimation for Survival Random Forests

Survival random forest is a popular machine learning tool for modeling c...
research
01/04/2019

Approximating high-dimensional infinite-order U-statistics: statistical and computational guarantees

We study the problem of distributional approximations to high-dimensiona...
research
08/02/2022

The nonparametric Behrens-Fisher problem in small samples

While there appears to be a general consensus in the literature on the d...
research
07/25/2019

Phase Transition Unbiased Estimation in High Dimensional Settings

An important challenge in statistical analysis concerns the control of t...
research
12/02/2019

Asymptotic Normality and Variance Estimation For Supervised Ensembles

Ensemble methods based on bootstrapping have improved the predictive acc...

Please sign up or login with your details

Forgot password? Click here to reset