Scalable and Efficient Hypothesis Testing with Random Forests

04/16/2019
by   Tim Coleman, et al.
0

Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods. While their black-box nature has made their mathematical analysis difficult, recent work has established important statistical properties like consistency and asymptotic normality by considering subsampling in lieu of bootstrapping. Though such results open the door to traditional inference procedures, all formal methods suggested thus far place severe restrictions on the testing framework and their computational overhead precludes their practical scientific use. Here we propose a permutation-style testing approach to formally assess feature significance. We establish asymptotic validity of the test via exchangeability arguments and show that the test maintains high power with orders of magnitude fewer computations. As importantly, the procedure scales easily to big data settings where large training and testing sets may be employed without the need to construct additional models. Simulations and applications to ecological data where random forests have recently shown promise are provided.

READ FULL TEXT
research
07/04/2022

FACT: High-Dimensional Random Forests Inference

Random forests is one of the most widely used machine learning methods o...
research
05/25/2019

Asymptotic Distributions and Rates of Convergence for Random Forests and other Resampled Ensemble Learners

Random forests remain among the most popular off-the-shelf supervised le...
research
07/22/2015

Banzhaf Random Forests

Random forests are a type of ensemble method which makes predictions by ...
research
04/25/2014

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

This work develops formal statistical inference procedures for machine l...
research
02/02/2023

Hypothesis Testing and Machine Learning: Interpreting Variable Effects in Deep Artificial Neural Networks using Cohen's f2

Deep artificial neural networks show high predictive performance in many...
research
05/22/2021

Statistical Testing under Distributional Shifts

Statistical hypothesis testing is a central problem in empirical inferen...
research
11/01/2019

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

Random forests remain among the most popular off-the-shelf supervised ma...

Please sign up or login with your details

Forgot password? Click here to reset