Variable importance in binary regression trees and forests

11/15/2007
by   Hemant Ishwaran, et al.
0

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

06/04/2019

Fréchet random forests

Random forests are a statistical learning method widely used in many are...
07/07/2020

Estimation and Inference with Trees and Forests in High Dimensions

We analyze the finite sample mean squared error (MSE) performance of reg...
07/28/2014

Understanding Random Forests: From Theory to Practice

Data analysis and machine learning have become an integrative part of th...
06/24/2019

Best Split Nodes for Regression Trees

Decision trees with binary splits are popularly constructed using Classi...
10/27/2018

Dealing with Uncertain Inputs in Regression Trees

Tree-based ensemble methods, as Random Forests and Gradient Boosted Tree...
02/18/2018

Training Big Random Forests with Little Resources

Without access to large compute clusters, building random forests on lar...
03/26/2020

From unbiased MDI Feature Importance to Explainable AI for Trees

We attempt to give a unifying view of the various recent attempts to (i)...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.