Uncovering Feature Interdependencies in Complex Systems with Non-Greedy Random Forests

09/30/2020
by   Delilah Donick, et al.
0

A "non-greedy" variation of the random forest algorithm is presented to better uncover feature interdependencies inherent in complex systems. Conventionally, random forests are built from "greedy" decision trees which each consider only one split at a time during their construction. In contrast, the decision trees included in this random forest algorithm each consider three split nodes simultaneously in tiers of depth two. It is demonstrated on synthetic data and bitcoin price time series that the non-greedy version significantly outperforms the greedy one if certain non-linear relationships between feature-pairs are present. In particular, both greedy and a non-greedy random forests are trained to predict the signs of daily bitcoin returns and backtest a long-short trading strategy. The better performance of the non-greedy algorithm is explained by the presence of "XOR-like" relationships between long-term and short-term technical indicators. When no such relationships exist, performance is similar. Given its enhanced ability to understand the feature-interdependencies present in complex systems, this non-greedy extension should become a standard method in the toolkit of data scientists.

READ FULL TEXT
research
11/03/2021

Oblique and rotation double random forest

An ensemble of decision trees is known as Random Forest. As suggested by...
research
08/11/2021

Trading Complexity for Sparsity in Random Forest Explanations

Random forests have long been considered as powerful model ensembles in ...
research
06/19/2015

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

We propose a novel algorithm for optimizing multivariate linear threshol...
research
02/20/2015

Feature-Budgeted Random Forest

We seek decision rules for prediction-time cost reduction, where complet...
research
10/16/2022

Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization

The need to learn from positive and unlabeled data, or PU learning, aris...
research
09/05/2011

Learning Nonlinear Functions Using Regularized Greedy Forest

We consider the problem of learning a forest of nonlinear decision rules...
research
12/14/2022

MABSplit: Faster Forest Training Using Multi-Armed Bandits

Random forests are some of the most widely used machine learning models ...

Please sign up or login with your details

Forgot password? Click here to reset