Log In Sign Up

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

by   Burim Ramosaj, et al.

Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is a helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assumptions and prove its (asymptotic) unbiasedness. An extensive simulation study verifies our findings.


page 1

page 2

page 3

page 4


The revisited knockoffs method for variable selection in L1-penalised regressions

We consider the problem of variable selection in regression models. In p...

Interpretable random forest models through forward variable selection

Random forest is a popular prediction approach for handling high dimensi...

Variable Importance Assessments and Backward Variable Selection for High-Dimensional Data

Variable selection in high-dimensional scenarios is of great interested ...

A Random Forest Approach for Modeling Bounded Outcomes

Random forests have become an established tool for classification and re...

A Feature Importance Analysis for Soft-Sensing-Based Predictions in a Chemical Sulphonation Process

In this paper we present the results of a feature importance analysis of...

A Random Forest Guided Tour

The random forest algorithm, proposed by L. Breiman in 2001, has been ex...

Sequential Permutation Testing of Random Forest Variable Importance Measures

Hypothesis testing of random forest (RF) variable importance measures (V...