DeepAI AI Chat
Log In Sign Up

Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests

by   Merle Behr, et al.

Random Forests (RF) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative Random Forests (iRF) use a tree ensemble from iteratively modified RF to obtain predictive and stable non-linear high-order Boolean interactions of features. They have shown great promise for high-order biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover high-order feature interactions are missing. In this paper, to enable such theoretical studies, we first introduce a novel discontinuous nonlinear regression model, called Locally Spiky Sparse (LSS) model, which is inspired by the thresholding behavior in many biological processes. Specifically, LSS model assumes that the regression function is a linear combination of piece-wise constant Boolean interaction terms. We define a quantity called depth-weighted prevalence (DWP) for a set of signed features S and a given RF tree ensemble. We prove that, with high probability under the LSS model, DWP of S attains a universal upper bound that does not involve any model coefficients, if and only if S corresponds to a union of Boolean interactions in the LSS model. As a consequence, we show that RF yields consistent interaction discovery under the LSS model. Simulation results show that DWP can recover the interactions under the LSS model even when some assumptions such as the uniformity assumption are violated.


page 1

page 2

page 3

page 4


Refining interaction search through signed iterative Random Forests

Advances in supervised learning have enabled accurate prediction in biol...

Iterative Random Forests to detect predictive and stable high-order interactions

Genomics has revolutionized biology, enabling the interrogation of whole...

A Confidence Machine for Sparse High-Order Interaction Model

In predictive modeling for high-stake decision-making, predictors must b...

An Approximation Method for Fitted Random Forests

Random Forests (RF) is a popular machine learning method for classificat...

Forest Floor Visualizations of Random Forests

We propose a novel methodology, forest floor, to visualize and interpret...

Safe Feature Pruning for Sparse High-Order Interaction Models

Taking into account high-order interactions among covariates is valuable...

Analyzing the tree-layer structure of Deep Forests

Random forests on the one hand, and neural networks on the other hand, h...