Please Stop Permuting Features: An Explanation and Alternatives

by   Giles Hooker, et al.

This paper advocates against permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because of their ability to provide model-agnostic measures that depend only on the pre-trained model output. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. Rather than simply add to this growing literature by further demonstrating such issues, here we seek to provide an explanation for the observed behavior. In particular, we argue that breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects through various settings where a ground-truth is understood and find support for previous claims in the literature that PaP metrics tend to over-emphasize correlated features both in variable importance and partial dependence plots, even though applying permutation methods to the ground-truth models do not. As an alternative, we recommend more direct approaches that have proven successful in other settings: explicitly removing features, conditional permutations, or model distillation methods.


page 1

page 2

page 3

page 4


Model-agnostic Feature Importance and Effects with Dependent Features – A Conditional Subgroup Approach

Partial dependence plots and permutation feature importance are popular ...

Evaluation of Local Model-Agnostic Explanations Using Ground Truth

Explanation techniques are commonly evaluated using human-grounded metho...

Bringing a Ruler Into the Black Box: Uncovering Feature Impact from Individual Conditional Expectation Plots

As machine learning systems become more ubiquitous, methods for understa...

Relating the Partial Dependence Plot and Permutation Feature Importance to the Data Generating Process

Scientists and practitioners increasingly rely on machine learning to mo...

Conditional expectation network for SHAP

A very popular model-agnostic technique for explaining predictive models...

SHAP for additively modeled features in a boosted trees model

An important technique to explore a black-box machine learning (ML) mode...

Hollow-tree Super: a directional and scalable approach for feature importance in boosted tree models

Current limitations in boosted tree modelling prevent the effective scal...

Please sign up or login with your details

Forgot password? Click here to reset