Variable importance in binary regression trees and forests

by   Hemant Ishwaran, et al.

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.



page 1

page 2

page 3

page 4


Fréchet random forests

Random forests are a statistical learning method widely used in many are...

Estimation and Inference with Trees and Forests in High Dimensions

We analyze the finite sample mean squared error (MSE) performance of reg...

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of th...

Best Split Nodes for Regression Trees

Decision trees with binary splits are popularly constructed using Classi...

Dealing with Uncertain Inputs in Regression Trees

Tree-based ensemble methods, as Random Forests and Gradient Boosted Tree...

Training Big Random Forests with Little Resources

Without access to large compute clusters, building random forests on lar...

From unbiased MDI Feature Importance to Explainable AI for Trees

We attempt to give a unifying view of the various recent attempts to (i)...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.