Variable importance scores

02/13/2021
by   Wei-Yin Loh, et al.
0

Scoring of variables for importance in predicting a response is an ill-defined concept. Several methods have been proposed but little is known of their performance. This paper fills the gap with a comparative evaluation of eleven methods and an updated one based on the GUIDE algorithm. For data without missing values, eight of the methods are shown to be biased in that they give higher or lower scores to different types of variables, even when all are independent of the response. Of the remaining four methods, only two are applicable to data with missing values, with GUIDE the only unbiased one. GUIDE achieves unbiasedness by using a self-calibrating step that is applicable to other methods for score de-biasing. GUIDE also yields a threshold for distinguishing important from unimportant variables at 95 and 99 percent confidence levels; the technique is applicable to other methods as well. Finally, the paper studies the relationship of the scores to predictive power in three data sets. It is found that the scores of many methods are more consistent with marginal predictive power than conditional predictive power.

READ FULL TEXT
research
06/24/2019

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE

A core step of every algorithm for learning regression trees is the sele...
research
08/08/2018

L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data

We study instancewise feature importance scoring as a method for model i...
research
08/05/2022

A Computational Exploration of Emerging Methods of Variable Importance Estimation

Estimating the importance of variables is an essential task in modern ma...
research
06/05/2023

Conformal Prediction with Missing Values

Conformal prediction is a theoretically grounded framework for construct...
research
01/10/2019

Variable Importance Clouds: A Way to Explore Variable Importance for the Set of Good Models

Variable importance is central to scientific studies, including the soci...
research
06/22/2022

Sharing pattern submodels for prediction with missing values

Missing values are unavoidable in many applications of machine learning ...
research
09/17/2021

Cross-Leverage Scores for Selecting Subsets of Explanatory Variables

In a standard regression problem, we have a set of explanatory variables...

Please sign up or login with your details

Forgot password? Click here to reset