Sequential Permutation Testing of Random Forest Variable Importance Measures

06/02/2022
by   Alexander Hapfelmeier, et al.
0

Hypothesis testing of random forest (RF) variable importance measures (VIMP) remains the subject of ongoing research. Among recent developments, heuristic approaches to parametric testing have been proposed whose distributional assumptions are based on empirical evidence. Other formal tests under regularity conditions were derived analytically. However, these approaches can be computationally expensive or even practically infeasible. This problem also occurs with non-parametric permutation tests, which are, however, distribution-free and can generically be applied to any type of RF and VIMP. Embracing this advantage, it is proposed here to use sequential permutation tests and sequential p-value estimation to reduce the high computational costs associated with conventional permutation tests. The popular and widely used permutation VIMP serves as a practical and relevant application example. The results of simulation studies confirm that the theoretical properties of the sequential tests apply, that is, the type-I error probability is controlled at a nominal level and a high power is maintained with considerably fewer permutations needed in comparison to conventional permutation testing. The numerical stability of the methods is investigated in two additional application studies. In summary, theoretically sound sequential permutation testing of VIMP is possible at greatly reduced computational costs. Recommendations for application are given. A respective implementation is provided through the accompanying R package rfvimptest. The approach can also be easily applied to any kind of prediction model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/04/2020

Analytic Permutation Testing via Kahane–Khintchine Inequalities

The permutation test is a versatile type of exact nonparametric signific...
research
03/14/2019

On the Use of Random Forest for Two-Sample Testing

We follow the line of using classifiers for two-sample testing and propo...
research
05/16/2023

BOSS – Biomarker Optimal Segmentation System

Motivation: Precision medicine is a major trend in the future of medicin...
research
04/28/2022

Generalized permutation tests

Permutation tests are an immensely popular statistical tool, used for te...
research
11/30/2018

Practical methods for graph two-sample testing

Hypothesis testing for graphs has been an important tool in applied rese...
research
05/27/2019

Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors

Statistical significance testing is widely accepted as a means to assess...
research
02/12/2015

Speeding up Permutation Testing in Neuroimaging

Multiple hypothesis testing is a significant problem in nearly all neuro...

Please sign up or login with your details

Forgot password? Click here to reset