Estimation and Inference with Trees and Forests in High Dimensions

07/07/2020
by   Vasilis Syrganis, et al.
0

We analyze the finite sample mean squared error (MSE) performance of regression trees and forests in the high dimensional regime with binary features, under a sparsity constraint. We prove that if only r of the d features are relevant for the mean outcome function, then shallow trees built greedily via the CART empirical MSE criterion achieve MSE rates that depend only logarithmically on the ambient dimension d. We prove upper bounds, whose exact dependence on the number relevant variables r depends on the correlation among the features and on the degree of relevance. For strongly relevant features, we also show that fully grown honest forests achieve fast MSE rates and their predictions are also asymptotically normal, enabling asymptotically valid inference that adapts to the sparsity of the regression function.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/15/2007

Variable importance in binary regression trees and forests

We characterize and study variable importance (VIMP) and pairwise variab...
08/07/2019

A Characterization of Mean Squared Error for Estimator with Bagging

Bagging can significantly improve the generalization performance of unst...
12/07/2021

Bless and curse of smoothness and phase transitions in nonparametric regressions: a nonasymptotic perspective

When the regression function belongs to the standard smooth classes cons...
09/20/2019

Does SLOPE outperform bridge regression?

A recently proposed SLOPE estimator (arXiv:1407.3824) has been shown to ...
01/10/2019

Mean Estimation from One-Bit Measurements

We consider the problem of estimating the mean of a symmetric log-concav...
03/03/2021

Minimax MSE Bounds and Nonlinear VAR Prewhitening for Long-Run Variance Estimation Under Nonstationarity

We establish new mean-squared error (MSE) bounds for long-run variance (...
09/20/2021

`Basic' Generalization Error Bounds for Least Squares Regression with Well-specified Models

This note examines the behavior of generalization capabilities - as defi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.