Interpretable random forest models through forward variable selection

05/11/2020
by   Jasper Velthoen, et al.
0

Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for obtaining an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. Our stepwise procedure leads to a smallest set of variables that optimizes the CRPS risk by performing at each step a hypothesis test on a significant decrease in CRPS risk. We provide mathematical motivation for our method by proving that in population sense the method attains the optimal set. Additionally, we show that the test is consistent provided that the random forest estimator of a quantile function is consistent. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and different correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10 predictive power.

READ FULL TEXT
research
12/05/2019

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

Variable selection in sparse regression models is an important task as a...
research
03/17/2022

GAM(L)A: An econometric model for interpretable Machine Learning

Despite their high predictive performance, random forest and gradient bo...
research
01/18/2019

A Random Forest Approach for Modeling Bounded Outcomes

Random forests have become an established tool for classification and re...
research
07/01/2023

A Transparent and Nonlinear Method for Variable Selection

Variable selection is a procedure to attain the truly important predicto...
research
10/20/2022

Vine copula based knockoff generation for high-dimensional controlled variable selection

Vine copulas are a flexible tool for high-dimensional dependence modelin...
research
09/29/2020

Selective Cascade of Residual ExtraTrees

We propose a novel tree-based ensemble method named Selective Cascade of...

Please sign up or login with your details

Forgot password? Click here to reset