Uncovering Feature Interdependencies in Complex Systems with Non-Greedy Random Forests

09/30/2020

∙

A "non-greedy" variation of the random forest algorithm is presented to better uncover feature interdependencies inherent in complex systems. Conventionally, random forests are built from "greedy" decision trees which each consider only one split at a time during their construction. In contrast, the decision trees included in this random forest algorithm each consider three split nodes simultaneously in tiers of depth two. It is demonstrated on synthetic data and bitcoin price time series that the non-greedy version significantly outperforms the greedy one if certain non-linear relationships between feature-pairs are present. In particular, both greedy and a non-greedy random forests are trained to predict the signs of daily bitcoin returns and backtest a long-short trading strategy. The better performance of the non-greedy algorithm is explained by the presence of "XOR-like" relationships between long-term and short-term technical indicators. When no such relationships exist, performance is similar. Given its enhanced ability to understand the feature-interdependencies present in complex systems, this non-greedy extension should become a standard method in the toolkit of data scientists.

READ FULL TEXT

Uncovering Feature Interdependencies in Complex Systems with Non-Greedy Random Forests

Oblique and rotation double random forest

Trading Complexity for Sparsity in Random Forest Explanations

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

Feature-Budgeted Random Forest

Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization

Learning Nonlinear Functions Using Regularized Greedy Forest

MABSplit: Faster Forest Training Using Multi-Armed Bandits

Uncovering Feature Interdependencies in Complex Systems with Non-Greedy Random Forests

Related Research

Oblique and rotation double random forest

Trading Complexity for Sparsity in Random Forest Explanations

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

Feature-Budgeted Random Forest

Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization

Learning Nonlinear Functions Using Regularized Greedy Forest

MABSplit: Faster Forest Training Using Multi-Armed Bandits