Opening the random forest black box by the analysis of the mutual impact of features

04/05/2023
by   Lucas F. Voges, et al.
0

Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples. Here we propose two novel approaches that focus on the mutual impact of features in random forests. Mutual forest impact (MFI) is a relation parameter that evaluates the mutual association of the featurs to the outcome and, hence, goes beyond the analysis of correlation coefficients. Mutual impurity reduction (MIR) is an importance measure that combines this relation parameter with the importance of the individual features. MIR and MFI are implemented together with testing procedures that generate p-values for the selection of related and important features. Applications to various simulated data sets and the comparison to other methods for feature selection and relation analysis show that MFI and MIR are very promising to shed light on the complex relationships between features and outcome. In addition, they are not affected by common biases, e.g. that features with many possible splits or high minor allele frequencies are prefered.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2020

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very po...
research
06/25/2011

The All Relevant Feature Selection using Random Forest

In this paper we examine the application of the random forest classifier...
research
01/18/2019

A Random Forest Approach for Modeling Bounded Outcomes

Random forests have become an established tool for classification and re...
research
12/05/2019

Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models

Variable selection in sparse regression models is an important task as a...
research
03/07/2020

Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance

As the size, complexity, and availability of data continues to grow, sci...
research
09/28/2016

Towards the effectiveness of Deep Convolutional Neural Network based Fast Random Forest Classifier

Deep Learning is considered to be a quite young in the area of machine l...
research
11/17/2020

A statistical machine learning approach for benchmarking in the presence of complex contextual factors and peer groups

The ability to compare between individuals or organisations fairly is im...

Please sign up or login with your details

Forgot password? Click here to reset