Understanding complex predictive models with Ghost Variables

12/13/2019
by   Pedro Delicado, et al.
17

We propose a procedure for assigning a relevance measure to each explanatory variable in a complex predictive model. We assume that we have a training set to fit the model and a test set to check the out of sample performance. First, the individual relevance of each variable is computed by comparing the predictions in the test set, given by the model that includes all the variables with those of another model in which the variable of interest is substituted by its ghost variable, defined as the prediction of this variable by using the rest of explanatory variables. Second, we check the joint effects among the variables by using the eigenvalues of a relevance matrix that is the covariance matrix of the vectors of individual effects. It is shown that in simple models, as linear or additive models, the proposed measures are related to standard measures of significance of the variables and in neural networks models (and in other algorithmic prediction models) the procedure provides information about the joint and individual effects of the variables that is not usually available by other methods. The procedure is illustrated with simulated examples and the analysis of a large real data set.

READ FULL TEXT

page 15

page 25

page 27

page 28

page 38

page 39

research
10/17/2019

Ranking variables and interactions using predictive uncertainty measures

For complex nonlinear supervised learning models, assessing the relevanc...
research
12/06/2022

The Importance of Variable Importance

Variable importance is defined as a measure of each regressor's contribu...
research
01/10/2013

Cross-covariance modelling via DAGs with hidden variables

DAG models with hidden variables present many difficulties that are not ...
research
11/03/2021

From global to local MDI variable importances for random forests and when they are Shapley values

Random forests have been widely used for their ability to provide so-cal...
research
07/11/2023

Conformalization of Sparse Generalized Linear Models

Given a sequence of observable variables {(x_1, y_1), …, (x_n, y_n)}, th...
research
08/18/2020

Clustering of variables for enhanced interpretability of predictive models

A new strategy is proposed for building easy to interpret predictive mod...
research
01/18/2017

A Machine Learning Alternative to P-values

This paper presents an alternative approach to p-values in regression se...

Please sign up or login with your details

Forgot password? Click here to reset