1 Introduction
The proliferation of blackbox models has made machine learning (ML) explainability an increasingly important subject, and researchers have now proposed a wide variety of model explanation approaches
Breiman (2001); Chen et al. (2018b); Covert et al. (2020); Lundberg and Lee (2017); Owen (2014); Petsiuk et al. (2018); Ribeiro et al. (2016); Sundararajan et al. (2017); Štrumbelj et al. (2009); Zeiler and Fergus (2014). Despite progress in the field, the relationships and tradeoffs among these methods have not been rigorously investigated, and researchers have not always formalized their fundamental ideas about how to interpret models (Lipton, 2018). This makes the literature difficult to navigate and raises questions about whether existing methods relate to human processes for explaining complex decisions (Miller et al., 2017; Miller, 2019).Here, we present a comprehensive framework that unifies a substantial portion of the model explanation literature. Our framework is based on the observation that many methods can be understood as simulating feature removal to quantify each feature’s influence on a model. The intuition behind these methods is similar (depicted in Figure 1), but each one takes a slightly different approach to the removal operation: some replace features with neutral values (Petsiuk et al., 2018; Zeiler and Fergus, 2014), others marginalize over a distribution of values (Lundberg and Lee, 2017; Strobl et al., 2008), and still others train separate models for each subset of features (Lipovetsky and Conklin, 2001; Štrumbelj et al., 2009). These methods also vary in other respects, as we describe below.
We refer to this class of approaches as removalbased explanations and identify 25^{1}^{1}1This total count does not include minor variations on the approaches we identified. existing methods that rely on the feature removal principle, including several of the most widely used methods (SHAP, LIME, Meaningful Perturbations, permutation tests). We then develop a framework that shows how each method arises from various combinations of three choices: 1) how the method removes features from the model, 2) what model behavior the method analyzes, and 3) how the method summarizes each feature’s influence on the model. By characterizing each method in terms of three precise mathematical choices, we are able to systematize their shared elements and reveal that they rely on the same fundamental approach—feature removal.
The model explanation field has grown significantly in the past decade, and we take a broader view of the literature than existing unification theories. Our framework’s flexibility enables us to establish links between disparate classes of methods (e.g., computer visionfocused methods, global methods, gametheoretic methods, feature selection methods) and show that the literature is more interconnected than previously recognized. Exposing these underlying connections potentially raises questions about the degree of novelty in recent work, but we also believe that each method has the potential to offer unique advantages, either computationally or conceptually.
Through this work, we hope to empower users to reason more carefully about which tools to use, and we aim to provide researchers with new theoretical tools to build on in ongoing research. Our contributions include:

[leftmargin=2pc]

We present a framework that unifies 25 existing explanations methods. Our framework for removalbased explanations integrates classes of methods that were previously considered disjoint, including local and global approaches as well as feature attribution and feature selection methods.

We develop new mathematical tools to represent different approaches to removing features from brittle ML models. Subset functions and model extensions provide a common representation for various feature removal techniques, revealing that this choice is interchangeable between methods.

We generalize numerous explanation methods to express them within our framework, exposing connections that were not apparent in the original works. In particular, for several approaches we disentangle the substance of the methods from the approximations that make them practical.
We begin with background on the model explanation problem and a review of prior work (Section 2), and we then introduce our framework (Section 3). The next several sections examine our framework in detail by showing how it encompasses existing methods. Section 4 discusses how methods remove features, Section 5 formalizes the model behaviors analyzed by each method, and Section 6 describes each method’s approach to summarizing feature influence. Finally, Section 7 concludes and discusses future research directions.
2 Background
Here, we introduce the model explanation problem and briefly review existing approaches and related unification theories.
2.1 Preliminaries
Consider a supervised ML model
that is used to predict a response variable
using the input , where each represents an individual feature, such as a patient’s age. We use uppercase symbols (e.g.,) to denote random variables and lowercase ones (e.g.,
) to denote their values. We also useto denote the domain of the full feature vector
and to denote the domain of each feature . Finally, denotes a subset of features for , and represents a set’s complement.ML interpretability broadly aims to provide insight into how models make predictions. This is particularly important when
is a complex model, such as a neural network or a decision forest. The most active area of research in the field is
local interpretability, which explains individual predictions, such as an individual patient diagnosis Lundberg and Lee (2017); Ribeiro et al. (2016); Sundararajan et al. (2017); in contrast, global interpretability explains the model’s behavior across the entire dataset Breiman (2001); Covert et al. (2020); Owen (2014). Both problems are usually addressed using feature attribution, where a score is assigned to explain each feature’s influence. However, recent work has also proposed the strategy of local feature selection (Chen et al., 2018b), and other papers have introduced methods to isolate sets of relevant features (Dabkowski and Gal, 2017; Fong and Vedaldi, 2017; Zhou et al., 2014).Whether the aim is local or global interpretability, explaining the inner workings of complex models is fundamentally difficult, so it is no surprise that researchers keep devising new approaches. Commonly cited categories of approaches include perturbationbased methods (Lundberg and Lee, 2017; Zeiler and Fergus, 2014), gradientbased methods (Simonyan et al., 2013; Sundararajan et al., 2017), and inherently interpretable models (Rudin, 2019; Zhou et al., 2016). However, these categories refer to loose collections of approaches that seldom share a precise mechanism.
Besides the inherently interpretable models, virtually all of these approaches generate explanations by considering some class of perturbation to the input and using the outcomes to explain each feature’s influence. Certain methods consider infinitesimal perturbations by calculating gradients (Simonyan et al., 2013; Smilkov et al., 2017; Sundararajan et al., 2017; Xu et al., 2020), but there are many possible perturbations (Fong and Vedaldi, 2017; Lundberg and Lee, 2017; Ribeiro et al., 2016; Zeiler and Fergus, 2014). Our work is based on the observation that numerous perturbation strategies can be understood as simulating feature removal.
2.2 Related work
Prior work has made solid progress in exposing connections among disparate explanation methods. Lundberg and Lee proposed the unifying framework of additive feature attribution methods and showed that LIME, DeepLIFT, LRP and QII are all related to SHAP (Bach et al., 2015; Datta et al., 2016; Lundberg and Lee, 2017; Ribeiro et al., 2016; Shrikumar et al., 2016)
. Similarly, Ancona et al. showed that Grad * Input, DeepLIFT, LRP and Integrated Gradients are all understandable as modified gradient backpropagations
Ancona et al. (2017); Shrikumar et al. (2016); Sundararajan et al. (2017). Most recently, Covert et al. showed that several global explanation methods can be viewed as additive importance measures, including permutation tests, Shapley Net Effects, and SAGE (Breiman, 2001; Covert et al., 2020; Lipovetsky and Conklin, 2001).Relative to prior work, the unification we propose is considerably broader but nonetheless precise. As we describe below, our framework characterizes methods along three dimensions. The choice of how to remove features has been considered by many works (Aas et al., 2019; Frye et al., 2020; Hooker and Mentch, 2019; Janzing et al., 2019; Lundberg and Lee, 2017; Merrick and Taly, 2019; Sundararajan and Najmi, 2019; Chang et al., 2018; Agarwal and Nguyen, 2019). The choice of what model behavior to analyze has been considered explicitly by only a few works (Covert et al., 2020; Lundberg et al., 2020), as has the choice of how to summarize each feature’s influence based on a set function (Covert et al., 2020; Datta et al., 2016; Frye et al., 2019; Lundberg and Lee, 2017; Štrumbelj et al., 2009). To our knowledge, ours is the first work to consider all three dimensions simultaneously and unite them under a single framework.
3 RemovalBased Explanations
We now introduce our framework and briefly describe the methods it unifies.
Method  Removal  Behavior  Summary 
IME (2009)  Separate models  Prediction  Shapley value 
IME (2010)  Marginalize (uniform)  Prediction  Shapley value 
QII  Marginalize (marginals product)  Prediction  Shapley value 
SHAP  Marginalize (conditional/marginal)  Prediction  Shapley value 
KernelSHAP  Marginalize (marginal)  Prediction  Shapley value 
TreeSHAP  Tree distribution  Prediction  Shapley value 
LossSHAP  Marginalize (conditional)  Prediction loss  Shapley value 
SAGE  Marginalize (conditional)  Dataset loss (label)  Shapley value 
Shapley Net Effects  Separate models  Dataset loss (label)  Shapley value 
Shapley Effects  Marginalize (conditional)  Dataset loss (output)  Shapley value 
Permutation Test  Marginalize (marginal)  Dataset loss (label)  Remove individual 
Conditional Perm. Test  Marginalize (conditional)  Dataset loss (label)  Remove individual 
Feature Ablation (LOCO)  Separate models  Dataset loss (label)  Remove individual 
Univariate Predictors  Separate models  Dataset loss (label)  Include individual 
L2X  Missingness during training  Prediction mean loss  Highvalue subset 
INVASE  Missingness during training  Prediction mean loss  Highvalue subset 
LIME (Images)  Default values  Prediction  Linear model 
LIME (Tabular)  Marginalize (replacement dist.)  Prediction  Linear model 
PredDiff  Marginalize (conditional)  Prediction  Remove individual 
Occlusion  Zeros  Prediction  Remove individual 
CXPlain  Zeros  Prediction loss  Remove individual 
RISE  Zeros  Prediction  Mean when included 
MM  Default values  Prediction  Partitioned subsets 
MIR  Extend pixel values  Prediction  Highvalue subset 
MP  Blurring  Prediction  Lowvalue subset 
EP  Blurring  Prediction  Highvalue subset 
FIDOCA  Generative model  Prediction  Highvalue subset 
3.1 A unified framework
We develop a unified model explanation framework by connecting methods that define a feature’s influence through the impact of removing it from a model. This perspective encompasses a substantial portion of the explainability literature: we find that 25 existing methods rely on this mechanism, including many of the most widely used approaches (Breiman, 2001; Fong and Vedaldi, 2017; Lundberg and Lee, 2017; Ribeiro et al., 2016).
These methods all remove groups of features from the model, but, beyond that, they take a diverse set of approaches. For example, LIME fits a linear model to an interpretable representation of the input (Ribeiro et al., 2016), L2X selects the most informative features for a single example (Chen et al., 2018b)
, and Shapley Effects examines how much of the model’s variance is explained by each feature
(Owen, 2014). Perhaps surprisingly, their differences are easy to systematize because each method removes discrete sets of features.As our main contribution, we introduce a framework that shows how these methods can be specified using only three choices.
Definition 1.
Removalbased explanations are model explanations that quantify the impact of removing sets of features from the model. These methods are determined by three choices:

(Feature removal) How the method removes features from the model (e.g., by setting them to default values or by marginalizing over a distribution of values)

(Model behavior) What model behavior the method analyzes (e.g., the probability of the true class or the model loss)

(Summary technique) How the method summarizes each feature’s impact on the model (e.g., by removing a feature individually or by calculating the Shapley values)
This precise yet flexible framework represents each choice as a specific type of mathematical function, as we show later. The framework unifies disparate explanation methods, and, by unraveling each method’s choices, offers a step towards a better understanding of the literature by allowing explicit reasoning about the tradeoffs among different approaches.
3.2 Overview of existing approaches
We now outline some of our findings, which we present in more detail in the next several sections. In particular, we preview how existing methods fit into our framework and highlight groups of methods that appear similar in light of our feature removal perspective.
Table 1 lists the methods unified by our framework (with acronyms introduced in the next section). These methods represent diverse parts of the interpretability literature, including global methods (Breiman, 2001; Owen, 2014), computer visionfocused methods (Petsiuk et al., 2018; Zeiler and Fergus, 2014; Zhou et al., 2014; Fong and Vedaldi, 2017), gametheoretic methods (Covert et al., 2020; Lundberg and Lee, 2017; Štrumbelj and Kononenko, 2010) and feature selection methods (Chen et al., 2018b; Fong et al., 2019; Yoon et al., 2018). They are all unified by their reliance on feature removal.
Disentangling the details of each method shows that many approaches share one or more of the same choices. For example, most methods choose to explain individual predictions (the model behavior), and the most popular summary technique is the Shapley value (Shapley, 1953). These common choices raise important questions about how different these methods truly are and how their choices are justified.
To highlight similarities among the methods, we visually depict the space of removalbased explanations in Figure 2. Visualizing our framework reveals several regions in the space of methods that are crowded (e.g., methods that marginalize out removed features with their conditional distribution and that calculate Shapley values), while certain methods are relatively unique and spatially isolated (e.g., RISE; LIME for tabular data; L2X and INVASE). Empty positions in the grid reveal opportunities to develop new methods; in fact, every empty position represents a viable new explanation method.
Removal  Behavior  Summary  Methods 
✓  ✓  IME, QII, SHAP, KernelSHAP, TreeSHAP  
✓  ✓  SHAP, LossSHAP, SAGE, Shapley Effects  
✓  ✓  Occlusion, LIME (images), MM, RISE  
✓  ✓  Feature ablation (LOCO), permutation tests, conditional permutation tests  
✓  ✓  Univariate predictors, feature ablation (LOCO), Shapley Net Effects  
✓  ✓  SAGE, Shapley Net Effects  
✓  ✓  SAGE, conditional permutation tests  
✓  ✓  Shapley Net Effects, IME (2009)  
✓  ✓  Occlusion, CXPlain  
✓  ✓  Occlusion, PredDiff  
✓  ✓  Conditional permutation tests, PredDiff  
✓  ✓  SHAP, PredDiff  
✓  ✓  MP, EP  
✓  ✓  EP, FIDOCA  
✓  ✓  ✓  L2X, INVASE 
Finally, Table 2 shows groups of methods that differ in only one dimension of the framework. These methods are neighbors in the space of explanation methods (Figure 2), and it is remarkable how many instances of neighboring methods exist in the literature. Certain methods even have neighbors along every dimension of the framework (e.g., SHAP, SAGE, Occlusion, PredDiff, conditional permutation tests), reflecting how intimately connected the literature has become. The explainability literature is evolving and maturing, and our perspective provides a new approach for reasoning about the subtle relationships and tradeoffs among existing approaches.
4 Feature Removal
Here, we define the mathematical tools necessary to remove features from ML models and then examine how existing explanation methods remove features.
4.1 Functions on subsets of features
Most ML models make predictions given a specific set of features . Mathematically, these models are functions are of the form , and we use to denote the set of all such possible mappings. The principle behind removalbased explanations is to remove certain features to understand their impact on a model, but since most models require all the features to make predictions, removing a feature is more complicated than simply not giving the model access to it.
To remove features from a model, or to make predictions given a subset of features, we require a different mathematical object than . Instead of functions with domain , we consider functions with domain , where denotes the power set of . To ensure invariance to the held out features, these functions must depend only on a set of features specified by a subset , so we formalize subset functions as follows.
Definition 2.
A subset function is a mapping of the form
that is invariant to the dimensions that are not in the specified subset. That is, we have for all such that . We define for convenience because the held out values are not used by .
A subset function’s invariance property is crucial to ensure that only the specified feature values determine the function’s output, while guaranteeing that the other feature values do not matter. Another way of viewing subset functions is that they simulate the presence of missing data. While we use to represent standard prediction functions, we use to denote the set of all possible subset functions.
We introduce subset functions here because they help conceptualize how different methods remove features from ML models. Removalbased explanations typically begin with an existing model , and in order to quantify each feature’s influence, they must establish a convention for removing it from the model. A natural approach is to define a subset function based on the original model . To formalize this idea, we define a model extension as follows.
Definition 3.
An extension of a model is a subset function that agrees with in the presence of all features. That is, the model and its extension must satisfy
As we show next, extending an existing model is the first step towards specifying a removalbased explanation method.
4.2 Removing features from machine learning models
Existing methods have devised numerous ways to evaluate models while withholding groups of features. Although certain methods use different terminology to describe their approaches (e.g., deleting information, ignoring features, using neutral values, etc.), the goal of these methods is to measure a feature’s influence through the impact of removing it from the model. Most proposed techniques can be understood as extensions of an existing model (Definition 3).
We now examine each method’s approach (see Appendix A for more details):

[leftmargin=2pc]

(Zeros) Occlusion (Zeiler and Fergus, 2014), RISE (Petsiuk et al., 2018) and causal explanations (CXPlain) (Schwab and Karlen, 2019) remove features simply by setting them to zero:
(1) 
(Default values) LIME for image data (Ribeiro et al., 2016) and the Masking Model method (MM) (Dabkowski and Gal, 2017) remove features by setting them to userdefined default values (e.g., gray pixels for images). Given default values , these methods calculate
(2) This is a generalization of the previous approach, and in some cases features may be given different default values (e.g., their mean).

(Missingness during training) Learning to Explain (L2X) (Chen et al., 2018b) and Instancewise Variable Selection (INVASE) (Yoon et al., 2018) use a model that has missingness introduced at training time. Removed features are replaced with zeros so that the model makes the following approximation:
(3) This approach differs from Occlusion and RISE because the model is trained to recognize zeros as missing values rather than zerovalued features. A model trained with a loss function other than cross entropy loss would approximate a different quantity (e.g., the conditional expectation
for MSE loss). 
(Extend pixel values) Minimal image representation (MIR) (Zhou et al., 2014) removes features in images by extending the values of neighboring pixels. This effect is achieved through a gradientspace manipulation.

(Blurring) Meaningful Perturbations (MP) (Fong and Vedaldi, 2017) and Extremal Perturbations (EP) (Fong et al., 2019) remove features from images by blurring them with a Gaussian kernel. This approach is not an extension of because the blurred image retains dependence on the removed features. Blurring fails to remove large, low frequency objects (e.g., mountains), but it provides an approximate way to remove information from images.

(Generative model) FIDOCA Chang et al. (2018) removes feature by replacing them with a sample from a conditional generative model (e.g. Yu et al. (2018)). The held out features are drawn from a generative model represented by , or and predictions are made as follows:
(4) 
(Marginalize with conditional) SHAP (Lundberg and Lee, 2017), LossSHAP (Lundberg et al., 2020) and SAGE (Covert et al., 2020) present a strategy for removing features by marginalizing them out using their conditional distribution :
(5) This approach is computationally challenging in practice, but recent work tries to achieve close approximations (Aas et al., 2019; Frye et al., 2020). Shapley Effects (Owen, 2014) implicitly uses this convention to analyze function sensitivity, while conditional permutation tests (Strobl et al., 2008) and Prediction Difference Analysis (PredDiff) (Zintgraf et al., 2017) do so to remove individual features.

(Marginalize with marginal) KernelSHAP (a practical implementation of SHAP) Lundberg and Lee (2017) removes features by marginalizing them out using their joint marginal distribution :
(6) This is the default behavior in SHAP’s implementation,^{2}^{2}2https://github.com/slundberg/shap and recent work discusses the benefits of this approach (Janzing et al., 2019). Permutation tests (Breiman, 2001) use this approach to remove individual features from a model.

(Marginalize with product of marginals) Quantitative Input Influence (QII) (Datta et al., 2016) removes held out features by marginalizing them out using the product of the marginal distributions :
(7) 
(Marginalize with uniform) The updated version of the Interactions Method for Explanation (IME) (Štrumbelj and Kononenko, 2010)
removes features by marginalizing them out with a uniform distribution over the feature space. If we let
denote a uniform distribution over (with extremal values defining the boundaries for continuous features), then features are removed as follows:(8) 
(Marginalize with replacement distributions) LIME for tabular data replaces features with independent draws from replacement distributions (our term), each of which depends on the original feature values. When a feature with value is removed, discrete features are drawn from the distribution ; when quantization is used for continuous features (LIME’s default behavior^{3}^{3}3https://github.com/marcotcr/lime
), continuous features are simulated by first generating a different quantile and then simulating from a truncated normal distribution within that bin. If we denote each feature’s replacement distribution given the original value
as , then LIME for tabular data removes features as follows:(9) Although this function agrees with given all features, it is not an extension because it does not satisfy the invariance property for subset functions.

(Tree distribution) Dependent TreeSHAP (Lundberg et al., 2020) removes features using the distribution induced by the model, which roughly approximates the conditional distribution. When splits for removed features are encountered in the model’s trees, TreeSHAP averages predictions from the multiple paths in proportion to how often the dataset follows each path.

(Separate models) Shapley Net Effects (Lipovetsky and Conklin, 2001) and the original version of IME (Štrumbelj et al., 2009) are not based on a single model but rather on separate models trained for each subset, which we denote as . The prediction for a subset of features is given by that subset’s model:
(10) Although this approach is technically an extension of the model trained with all features, its predictions given subsets of features are not based on . Similarly, feature ablation, also known as leaveonecovariateout (LOCO) (Lei et al., 2018), trains models to remove individual features, and the univariate predictors approach (used mainly for feature selection) uses models trained with individual features (Guyon and Elisseeff, 2003).
Most of these approaches are extensions of an existing model , so our formalisms provide useful tools for understanding how removalbased explanations remove features from models. However, consider two exceptions: the blurring technique (MP and EP) and LIME’s approach with tabular data. Both provide functions of the form that agree with given all features, but that still exhibit dependence on removed features. Based on our mathematical characterization of subset functions and their invariance to held out features, we argue that these two approaches do not fully remove features from the model. We conclude that the first dimension of our framework amounts to choosing an extension of the model .
5 Explaining Different Model Behaviors
Removalbased explanations all aim to demonstrate how a model works, but they can do so by analyzing a variety of model behaviors. We now consider the various choices of target quantities to observe as different features are withheld from the model.
The feature removal principle is flexible enough to explain virtually any function. For example, methods can explain a model’s prediction, a model’s loss function, a hidden layer in a neural network, or any node in a computation graph. In fact, removalbased explanations need not be restricted to the ML context: any function that accommodates missing inputs can be explained via feature removal by examining either its output or some function of its output as groups of inputs are removed. This perspective shows the broad potential applications for removalbased explanations.
However, since our focus is the ML context, we proceed by examining how existing methods work. Each method’s target quantity can be understood as a function of the model output, which is represented by a subset function
. Many methods explain the model output or a simple function of the output, such as the logodds ratio. Other methods take into account a measure of the model’s loss, for either an individual input or the entire dataset. Ultimately, as we show below, each method generates explanations based on a set function of the form
which represents a value associated with each subset of features . This set function represents the model behavior that a method is designed to explain.
We now examine the specific choices made by existing methods (see Appendix A for further details on each method). The various model behaviors that methods analyze, and their corresponding set functions, include:

[leftmargin=2pc]

(Prediction) Occlusion, RISE, PredDiff, MP, EP, MM, FIDOCA, MIR, LIME, SHAP (including KernelSHAP and TreeSHAP), IME and QII all analyze a model’s prediction for an individual input :
(11) These methods quantify how holding out different features makes an individual prediction either higher or lower. For multiclass classification models, methods often use a single output that corresponds to the class of interest, and they can also apply a simple function to the model’s output (for example, using the logodds ratio rather than classification probability).

(Prediction loss) LossSHAP and CXPlain take into account the true label for an input and calculate the prediction loss using a loss function :
(12) By incorporating label information, these methods quantify whether certain features make the prediction more or less correct. The minus sign is necessary to give the set function a higher value when more informative features are included.

(Prediction mean loss) L2X and INVASE consider the expected loss for a given input according to the label’s conditional distribution :
(13) By averaging the loss across the label’s distribution, these methods highlight features that correctly predict what could have occurred, on average.

(Dataset loss w.r.t. label) Shapley Net Effects, SAGE, feature ablation (LOCO), permutation tests and univariate predictors consider the expected loss across the entire dataset:
(14) These methods quantify how much the model’s performance degrades when different features are removed. This set function can also be viewed as the predictive power derived from sets of features (Covert et al., 2020). Recent work has proposed a SHAP value aggregation scheme that can be considered a special case of this approach Frye et al. (2020).

(Dataset loss w.r.t. output) Shapley Effects considers the expected loss with respect to the full model output:
(15) Though related to the previous approach (Covert et al., 2020), Shapley Effects focuses on each feature’s influence on the model output rather than on the model performance.
Each set function serves a distinct purpose in exposing a model’s dependence on different features. The first three approaches listed above analyze the model’s behavior for individual predictions (local explanations); the last two take into account the model’s behavior across the entire dataset (global explanations). Although their aims differ, these set functions are all in fact related. Each builds upon the previous ones by accounting for either the loss or data distribution, and their relationships can be summarized as follows:
(16)  
(17)  
(18)  
(19) 
These relationships show that explanations based on one set function are in some cases related to explanations based on another. For example, Covert et al. showed that SAGE explanations are the expectation of explanations provided by LossSHAP Covert et al. (2020)—a relationship reflected in Eq. 18.
Understanding these connections is possible only because our framework disentangles each method’s choices rather than viewing each method as a monolithic algorithm. We conclude by reiterating that removalbased explanations can explain virtually any function, and that choosing what to explain amounts to selecting a set function to represent the model’s dependence on different sets of features.
6 Summarizing Feature Influence
The third choice for removalbased explanations is how to summarize each feature’s influence on the model. We examine the various summarization techniques and then discuss their computational complexity and approximation approaches.
6.1 Explaining set functions
The set functions we used to represent a model’s dependence on different features (Section 5) are complicated mathematical objects that are difficult to communicate fully due to the exponential number of feature subsets and underlying feature interactions. Removalbased explanations confront this challenge by providing users with a concise summary of each feature’s influence.
We distinguish between two main types of summarization approaches: feature attributions and feature selections. Many methods provide explanations in the form of feature attributions, which are numerical scores given to each feature . If we use to denote the set of all functions , then we can represent feature attributions as mappings of the form , which we refer to as explanation mappings. Other methods take the alternative approach of summarizing set functions with a set of the most influential features. We represent these feature selection summaries as explanation mappings of the form . Both approaches provide users with simple summaries of a feature’s contribution to the set function.
We now consider the specific choices made by each method (see Appendix A for further details). For simplicity, we let denote the set function each method analyzes. Surveying the various removalbased explanation methods, the techniques for summarizing each feature’s influence include:

[leftmargin=2pc]

(Remove individual) Occlusion, PredDiff, CXPlain, permutation tests and feature ablation (LOCO) calculate the impact of removing a single feature from the set of all features, resulting in the following attribution values:
(20) Occlusion, PredDiff and CXPlain can also be applied with groups of features in image contexts.

(Include individual) The univariate predictors approach calculates the impact of including individual features, resulting in the following attribution values:
(21) This is essentially the reverse of the previous approach: while that approach removes individual features from the complete set, this one adds individual features to the empty set.

(Linear model) LIME fits a regularized weighted linear model to a dataset of perturbed examples. In the limit of an infinitely large dataset, this process approximates the following attribution values:
(22) In this problem, represents a weighting kernel and is a regularization function that is often set to the penalty to encourage sparse attributions Tibshirani (1996). Since this summary is based on an additive model, the learned coefficients represent values associated with including each feature.

(Mean when included) RISE determines feature attributions by sampling many subsets and then calculating the mean value when a feature is included. Denoting the distribution of subsets as and the conditional distribution as , the attribution values are defined as
(23) In practice, RISE samples the subsets by removing each feature independently with probability , using in the original experiments (Petsiuk et al., 2018).

(Shapley value) Shapley Net Effects, IME, Shapley Effects, QII, SHAP (including KernelSHAP, TreeSHAP and LossSHAP) and SAGE all calculate feature attribution values using the Shapley value, which we denote as . Shapley values are the only attributions that satisfy a number of desirable properties (Shapley, 1953).

(Lowvalue subset) MP selects a small set of features that can be removed to give the set function a low value. It does so by solving the following optimization problem:
(24) In practice, MP uses additional regularizers and solves a relaxed version of this problem (see Section 6.2).

(Highvalue subset) MIR solves an optimization problem to select a small set of features that alone can give the set function a high value. For a userdefined minimum value , the problem is given by:
(25) L2X and EP solve a similar problem but switch the terms in the constraint and optimization objective. For a userdefined subset size , the optimization problem is given by:
(26) Finally, INVASE and FIDOCA solve a regularized version of the problem with a parameter controlling the tradeoff between the subset value and subset size:
(27) 
(Partitioned subsets) MM solves an optimization problem to partition the features into and while maximizing the difference in the set function’s values. This approach is based on the idea that removing features to find a lowvalue subset (as in MP) and retaining features to get a highvalue subset (as in MIR, L2X, EP, INVASE and FIDOCA) are both reasonable approaches for identifying influential features. The problem is given by:
(28) In practice, MM incorporates regularizers and monotonic link functions to enable a more flexible tradeoff between and (see Appendix A).
As this discussion shows, every removalbased explanation generates summaries of each feature’s influence on the underlying set function. In general, a model’s dependencies are too complex to communicate fully, so explanations must provide users with a concise summary instead. As noted, most methods we discuss generate feature attributions, but several others generate explanations by selecting the most important features. These feature selection explanations are essentially coarse attributions that assign binary importance rather than a real number.
Interestingly, if the highvalue subset optimization problems solved by MIR, L2X, EP, INVASE and FIDOCA were applied to the set function that represents the dataset loss (Eq. 18), they would resemble conventional global feature selection problems (Guyon and Elisseeff, 2003). The problem in Eq. 26 determines the set of features with maximum predictive power, the problem in Eq. 25 determines the smallest possible set of features that achieve the performance represented by , and the problem in Eq. 27 uses a parameter to control the tradeoff. Though not generally viewed as a model explanation approach, global feature selection serves an identical purpose of identifying highly predictive features.
We conclude by reiterating that the third dimension of our framework amounts to a choice of explanation mapping, which takes the form for feature attribution or for feature selection. Our discussion so far has shown that removalbased explanations can be specified using three precise mathematical choices, as depicted in Figure 3. These methods, which are often presented in ways that make their connections difficult to discern, are constructed in a remarkably similar fashion.
6.2 Complexity and approximations
Showing how certain explanation methods fit into our framework requires distinguishing between their substance and the approximations that make them practical. Our presentation of these methods deviates from the original papers, which often focus on details of a method’s implementation. We now bridge the gap by describing these methods’ significant computational complexity and the approximations they use out of necessity.
The challenge with most summarization techniques described above is that they require calculating the underlying set function’s value for many subsets of features. In fact, without making any simplifying assumptions about the model or data distribution, several techniques must examine all subsets of features. This includes the Shapley value, RISE’s summary technique and LIME’s linear model. Finding exact solutions to several of the optimization problems (MP, MIR, MM, INVASE, FIDOCA) also requires examining all subsets of features, and solving the constrained optimization problem (EP, L2X) for features requires examining subsets, or subsets in the worst case.^{4}^{4}4This can be seen by applying Stirling’s approximation to as becomes large.
The only approaches with lower computational complexity are those that remove individual features (Occlusion, PredDiff, CXPlain, permutation tests, feature ablation) or include individual features (univariate predictors). These require only one subset per feature, or total feature subsets.
Many summarization techniques have superpolynomial complexity in , making them intractable for large numbers of features. However, these methods work in practice due to fast approximation approaches, and in some cases methods have even been devised to generate explanations in realtime. Strategies that yield fast approximations include:

[leftmargin=2pc]

Attribution values that are the expectation of a random variable can be estimated using Monte Carlo approximations. IME
(Štrumbelj and Kononenko, 2010), Shapley Effects (Song et al., 2016) and SAGE (Covert et al., 2020) use sampling strategies to approximate Shapley values, and RISE also estimates its attributions via sampling (Petsiuk et al., 2018). 
KernelSHAP and LIME are both based on linear regression models fitted to datasets containing an exponential number of datapoints. In practice, these techniques fit models to smaller sampled datasets, which means optimizing an approximate version of their objective function.

TreeSHAP calculates Shapley values in polynomial time using a dynamic programming algorithm that exploits the structure of treebased models. Similarly, LShapley and CShapley exploit the properties of models for structured data to provide fast Shapley value approximations (Chen et al., 2018a).

Several of the feature selection methods (MP, L2X, EP, MM, FIDOCA) solve continuous relaxations of their discrete optimization problems. While these optimization problems could be solved by representing the set of features as a mask , these methods instead use a mask variable of the form .

One feature selection method (MIR) uses a greedy optimization algorithm. MIR determines a set of influential features by iteratively removing groups of features that do not reduce the predicted probability for the correct class.

One feature attribution method (CXPlain) and three feature selection methods (L2X, INVASE, MM) generate realtime explanations by learning separate explainer models. CXPlain learns an explainer model using a dataset consisting of manually calculated explanations, which removes the need to iterate over each feature after training. L2X learns a model that outputs a set of features (represented by a hot vector) and INVASE learns a similar selector model that can output an arbitrary number of features; similarly, MM learns a model that outputs masks of the form for images. These techniques can be viewed as amortized optimization approaches (Shu, 2017) because they learn models that output approximate solutions in a single forward pass (similar to amortized inference Kingma and Welling (2013)).
In conclusion, many methods provide efficient explanations despite using summarization techniques that are inherently intractable. Each approximation significantly speeds up computation relative to a bruteforce calculation, but we predict that more approaches could be made to run in realtime by learning explainer models, as in the MM, L2X, INVASE and CXPlain approaches (Chen et al., 2018b; Dabkowski and Gal, 2017; Schwab and Karlen, 2019; Yoon et al., 2018).
7 Discussion
In this work, we developed a unified framework that characterizes a significant portion of the model explanation literature (25 existing methods). Removalbased explanations have a great degree of flexibility, and we systematized their differences by showing that each method is specified by three precise mathematical choices:

[leftmargin=2pc]

How the method removes features. Each method specifies a subset function to make predictions with subsets of features, often based on an existing model .

What model behavior the method analyzes. Each method implicitly relies on a set function to represent the model’s dependence on different groups of features. The set function describes the model’s behavior either for an individual prediction or across the entire dataset.

How the method summarizes each feature’s influence. Methods generate explanations that provide a concise summary of each feature’s contribution to the set function . Mappings of the form generate feature attribution explanations, and mappings of the form generate feature selection explanations.
The growing interest in blackbox ML models has spurred a remarkable amount of model explanation research, and in the past decade we have seen a number of publications proposing innovative new methods. However, as the field has matured we have also seen a growing number of unifying theories that reveal underlying similarities and implicit relationships Ancona et al. (2017); Covert et al. (2020); Lundberg and Lee (2017). Our framework for removalbased explanations is perhaps the broadest unifying theory yet, and it bridges the gap between disparate parts of the explainability literature.
An improved understanding of the field presents new opportunities for both explainability users and researchers. For users, we hope that our framework will allow for more explicit reasoning about the tradeoffs between available explanation tools. The unique advantages of different methods are difficult to understand when they are viewed as monolithic algorithms, but disentangling their choices makes it simpler to reason about their strengths and weaknesses.
For researchers, our framework offers several promising directions for future work. We identify three key areas that can be explored to better understand the tradeoffs between different removalbased explanations:

[leftmargin=2pc]

Several of the methods characterized by our framework can be interpreted using ideas from information theory Chen et al. (2018b); Covert et al. (2020). We suspect that other methods can be understood with an informationtheoretic perspective and that this may shed light on whether there are theoretically justified choices for each dimension of our framework.

As we showed in Section 5, every removalbased explanation is based on an underlying set function that represents the model’s behavior. Set functions can be viewed as cooperative games, and we suspect that methods besides those that use Shapley values Covert et al. (2020); Datta et al. (2016); Lundberg and Lee (2017); Owen (2014); Štrumbelj et al. (2009)
can be related to techniques from cooperative game theory.

Finally, it is remarkable that so many researchers have developed, with some degree of independence, explanation methods based on the same feature removal principle. We speculate that cognitive psychology may shed light on why this represents a natural approach to explaining complex decision processes. This would be impactful for the field because, as recent work has pointed out, explainability research is surprisingly disconnected from the social sciences Miller (2019); Miller et al. (2017).
In conclusion, as the field evolves and the number of removalbased explanations continues to grow, we hope that our framework can serve as a foundation upon which future research can build.
Appendix A Method Details
Here, we provide additional details about some of the explanation methods discussed in the main text. In several cases, we presented generalized versions of methods that deviated from their explanations in the original papers.
a.1 Meaningful Perturbations (MP)
Meaningful Perturbations (Fong and Vedaldi, 2017) considers multiple ways of deleting information from an input image, and the approach it recommends is a blurring operation. Given a mask , MP uses a function to denote the modified input and suggests that the mask may be used to 1) set pixels to a constant value, 2) replace them with Gaussian noise, or 3) blur the image. In the blurring approach, each pixel
is blurred separately using a Gaussian kernel with standard deviation given by
(for a user specified ).To prevent adversarial solutions, MP incorporates a total variation norm on the mask, upsamples it from a lowresolution version, and uses a random jitter on the image during optimization. Additionally, MP uses a continuous mask in place of a binary mask and the penalty on the mask in place of the penalty. Although MP’s optimization tricks are key to providing visually compelling explanations, our presentation focuses on the most essential part of the optimization objective, which is reducing the classification probability while blurring only a small part of the image (Eq. 24).
a.2 Extremal Perturbations (EP)
Extremal Perturbations (Fong et al., 2019) is an extension of MP with several modifications. The first is switching the objective from a “removal game” to a “preservation game,” which means learning a mask that retains rather than removes the salient information. The second is replacing the penalty on the subset size (or the mask norm) with a constraint. In practice, the constraint is enforced using a penalty, but the authors argue that it should still be viewed as a constraint due to the use of a large regularization parameter.
EP uses the same blurring operation as MP and introduces new tricks to ensure a smooth mask, but our presentation focuses on the most important part of the optimization problem, which is maximizing the classification probability while blurring a fixed portion of the image (Eq. 26).
a.3 FidoCa
FIDOCA Chang et al. (2018) is similar to EP but it replaces the blurring operation with features drawn from a generative model. The generative model
can condition on arbitrary subsets of features, and although its samples are nondeterministic, FIDOCA achieves strong results using a single sample. The authors consider multiple generative models but recommend a generative adversarial network (GAN) that uses contextual attention
Yu et al. (2018). The optimization objective is based on the same “preservation game” as EP, and the authors use the Concrete reparameterization trick Maddison et al. (2016) for optimization.a.4 Minimal Image Representation (MIR)
The Minimal Image Representation approach Zhou et al. (2014)
removes information from an image to determine which regions are salient for the desired class. MIR works by creating a segmentation of edges and regions and iteratively removing segments from the image (selecting those that least decrease the classification probability) until the remaining image is incorrectly classified. We view this as a greedy approach for solving the constrained optimization problem
where represents the prediction with the specified subset of features and represents the minimum allowable classification probability. Our presentation of MIR in the main text focuses on this view of the optimization objective rather than the specific greedy algorithm MIR uses (Eq. 25).
a.5 Masking Model (MM)
The Masking Model approach (Dabkowski and Gal, 2017) observes that removing salient information (while preserving irrelevant information) and removing irrelevant information (while preserving salient information) are both reasonable approaches to understanding image classifiers. The authors refer to these tasks as discovering the smallest destroying region (SDR) and smallest sufficient region (SSR).
The authors adopt notation similar to MP Fong and Vedaldi (2017), using to denote the transformation to the input given a mask . For an input , the authors aim to solve the following optimization problem:
The (total variation) and penalty terms are both similar to MP and respectively encourage smoothness and sparsity in the mask. Unlike MP, MM learns a global explainer model that outputs approximate solutions to this problem in a single forward pass. In the main text, we provide a simplified presentation of the problem that does not include the logarithm in the third term or the exponent in the fourth term (Eq. 28). We view these as monotonic link functions that provide a more complex tradeoff between the objectives but that are not necessary for finding informative solutions.
a.6 Learning to Explain (L2X)
The first theorem of the L2X paper (Chen et al., 2018b) says that the explanation they seek is the distribution that optimizes the following objective:
If we replace the conditional probability with a subset function and allow for loss functions other than cross entropy, then we recover the version of this problem that we present in the main text. The L2X paper focuses on classification problems and an interpretation of their approach in terms of mutual information maximization; for a regression task evaluated with MSE loss, the approach could be interpreted analogously as performing conditional variance minimization.
a.7 Instancewise Variable Selection (INVASE)
The INVASE method Yoon et al. (2018) is very similar to L2X, but it parameterizes the selector model differently. Rather than constraining the explanations to contain exactly
features, INVASE generates a set of features from a factorized Bernoulli distribution conditioned on the input
, using a regularization parameter to control the tradeoff between the number of features and the expected value of the loss function. Instead of optimizing the selector model with reparameterization gradients, INVASE is learned using an actorcritic approach.a.8 Prediction Difference Analysis (PredDiff)
Prediction Difference Analysis (Zintgraf et al., 2017)
removes individual features (or groups of features) and analyzes the difference in a model’s prediction. Removed pixels are imputed by conditioning on their bordering pixels, which approximates sampling from the full conditional distribution
. Rather than measuring the prediction difference directly, the authors use attribution scores based on the logodds ratio:We view this as another way of analyzing the difference in the model output for an individual prediction.
a.9 Causal Explanations (CXPlain)
CXPlain removes single features (or groups of features) for individual inputs and measures the change in the loss function (Schwab and Karlen, 2019). The authors propose calculating the attribution values
and then computing the normalized values
The normalization step enables the use of a learning objective based on KullbackLeibler divergence for the explainer model, which is ultimately used to calculate attribution values in a single forward pass. The authors explain that this approach is based on a “causal objective,” but CXPlain is causal in the same sense as every other method described in our work.
a.10 Randomized Input Sampling for Explanation (RISE)
The RISE method (Petsiuk et al., 2018) begins by generating a large number of randomly sampled binary masks. In practice, the masks are sampled by dropping features from a lowresolution mask independently with probability , upsampling to get an imagesized mask, and then applying a random jitter. Due to the upsampling, the masks have values rather than .
The mask generation process induces a distribution over the masks, which we denote as . The method then uses the randomly generated masks to obtain a Monte Carlo estimate of the following attribution values:
If we ignore the upsampling step that creates continuous mask values, we see that these attribution values are the mean prediction when a given pixel is included:
a.11 Interactions Methods for Explanations (IME)
IME was presented in two separate papers (Štrumbelj et al., 2009; Štrumbelj and Kononenko, 2010). In the original version, the authors recommended training a separate model for each subset of features. In the second version, the authors proposed the more efficient approach of marginalizing out the removed features from a single model .
The latter paper is ambiguous about the specific distribution used to marginalize out held out features (Štrumbelj and Kononenko, 2010). Lundberg and Lee Lundberg and Lee (2017) view that features are marginalized out using their distribution from the training dataset (i.e., the marginal distribution). In contrast, Merrick and Taly Merrick and Taly (2019) view IME as marginalizing out features using a uniform distribution. Upon a close reading of the paper, we opt for the uniform interpretation, but the specific interpretation of IME’s choice of distribution does not impact any of our conclusions.
a.12 TreeSHAP
TreeSHAP uses a unique approach to handle held out features in treebased models (Lundberg et al., 2020). It accounts for missing features using the distribution induced by the underlying trees, and, since it exhibits no dependence on the held out features, it is a valid extension of the original model. However, it cannot be viewed as marginalizing out features using a simple distribution.
Given a subset of features, TreeSHAP makes a prediction separately for each tree and then combines each tree’s prediction in the standard fashion. But when a split for an unknown feature is encountered, TreeSHAP averages predictions over the multiple paths in proportion to how often the dataset follows each path. This is similar but not identical to the conditional distribution because each time this averaging step is performed, TreeSHAP conditions only on coarse information about the features that preceded the split.
a.13 Shapley Net Effects
Shapley Net Effects was originally proposed for linear models that use MSE loss, but we generalize the method to arbitrary model classes and arbitrary loss functions. Unfortunately, Shapley Net Effects quickly becomes impractical with large numbers of features or nonlinear models.
a.14 Shapley Effects
Shapley Effects analyzes a variancebased measure of a function’s sensitivity to its inputs, with the goal of discovering which features are responsible for the greatest variance reduction in the model output (Owen, 2014). The cooperative game described in the paper is:
We present a generalized version to cast this method in our framework. In the appendix of Covert et al. Covert et al. (2020), it was shown that this game is equal to:
This derivation assumes that the loss function is MSE and that the subset function is . Rather than the original formulation, we present a cooperative game that is equivalent up to a constant value and that provides flexibility in the choice of loss function:
References
 [1] (2019) Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. arXiv preprint arXiv:1903.10464. Cited by: §2.2, 7th item.

[2]
(2018)
Peeking inside the blackbox: a survey on explainable artificial intelligence (xai)
. IEEE Access 6, pp. 52138–52160. Cited by: §2.2.  [3] (2019) Explaining an image classifier’s decisions using generative models. arXiv preprint arXiv:1910.04256. Cited by: §2.2.
 [4] (2017) Towards better understanding of gradientbased attribution methods for deep neural networks. arXiv preprint arXiv:1711.06104. Cited by: §2.2, §7.
 [5] (2015) On pixelwise explanations for nonlinear classifier decisions by layerwise relevance propagation. PloS One 10 (7), pp. e0130140. Cited by: §2.2.
 [6] (2001) Random forests. Machine Learning 45 (1), pp. 5–32. Cited by: §1, §2.1, §2.2, §3.1, §3.2, 8th item.
 [7] (2018) Explaining image classifiers by counterfactual generation. arXiv preprint arXiv:1807.08024. Cited by: §A.3, §2.2, 6th item.
 [8] (2018) LShapley and CShapley: efficient model interpretation for structured data. arXiv preprint arXiv:1808.02610. Cited by: 3rd item.
 [9] (2018) Learning to explain: an informationtheoretic perspective on model interpretation. arXiv preprint arXiv:1802.07814. Cited by: §A.6, §1, §2.1, §3.1, §3.2, 3rd item, §6.2, 1st item.
 [10] (2020) Understanding global feature contributions through additive importance measures. arXiv preprint arXiv:2004.00668. Cited by: §A.14, §1, §2.1, §2.2, §2.2, §3.2, 7th item, 4th item, 5th item, §5, 1st item, 1st item, 2nd item, §7.
 [11] (2017) Real time image saliency for black box classifiers. In Advances in Neural Information Processing Systems, pp. 6967–6976. Cited by: §A.5, §2.1, 2nd item, §6.2.
 [12] (2016) Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 598–617. Cited by: §2.2, §2.2, 9th item, 2nd item.
 [13] (2017) Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437. Cited by: §A.1, §A.5, §2.1, §2.1, §3.1, §3.2, 5th item.
 [14] (2019) Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2950–2958. Cited by: §A.2, §3.2, 5th item.
 [15] (2020) Shapleybased explainability on the data manifold. arXiv preprint arXiv:2006.01272. Cited by: §2.2, 7th item, 4th item.
 [16] (2019) Asymmetric shapley values: incorporating causal knowledge into modelagnostic explainability. arXiv preprint arXiv:1910.06358. Cited by: §2.2.
 [17] (2018) A survey of methods for explaining black box models. ACM Computing Surveys (CSUR) 51 (5), pp. 1–42. Cited by: §2.2.
 [18] (2003) An introduction to variable and feature selection. Journal of Machine Learning Research 3 (Mar), pp. 1157–1182. Cited by: 13rd item, §6.1.
 [19] (2019) Please stop permuting features: an explanation and alternatives. arXiv preprint arXiv:1905.03151. Cited by: §2.2.
 [20] (2019) Feature relevance quantification in explainable AI: a causality problem. arXiv preprint arXiv:1910.13413. Cited by: §2.2, 8th item.
 [21] (2013) Autoencoding variational bayes. arXiv preprint arXiv:1312.6114. Cited by: 6th item.
 [22] (2018) Distributionfree predictive inference for regression. Journal of the American Statistical Association 113 (523), pp. 1094–1111. Cited by: 13rd item.
 [23] (2001) Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry 17 (4), pp. 319–330. Cited by: §1, §2.2, 13rd item.
 [24] (2018) The mythos of model interpretability. Queue 16 (3), pp. 31–57. Cited by: §1.
 [25] (2017) A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pp. 4765–4774. Cited by: §A.11, §1, §1, §2.1, §2.1, §2.1, §2.2, §2.2, §3.1, §3.2, 7th item, 8th item, 2nd item, §7.
 [26] (2020) From local explanations to global understanding with explainable ai for trees. Nature Machine Intelligence 2 (1), pp. 2522–5839. Cited by: §A.12, §2.2, 12nd item, 7th item.

[27]
(2016)
The concrete distribution: a continuous relaxation of discrete random variables
. arXiv preprint arXiv:1611.00712. Cited by: §A.3.  [28] (2019) The explanation game: explaining machine learning models with cooperative game theory. arXiv preprint arXiv:1909.08128. Cited by: §A.11, §2.2.
 [29] (2017) Explainable AI: beware of inmates running the asylum or: how I learnt to stop worrying and love the social and behavioural sciences. arXiv preprint arXiv:1712.00547. Cited by: §1, 3rd item.
 [30] (2019) Explanation in artificial intelligence: insights from the social sciences. Artificial Intelligence 267, pp. 1–38. Cited by: §1, 3rd item.
 [31] (2014) Sobol’ indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification 2 (1), pp. 245–251. Cited by: §A.14, §1, §2.1, §3.1, §3.2, 7th item, 2nd item.
 [32] (2018) RISE: randomized input sampling for explanation of blackbox models. arXiv preprint arXiv:1806.07421. Cited by: §A.10, §1, §1, §3.2, 1st item, 4th item, 1st item.
 [33] (2016) "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. Cited by: §1, §2.1, §2.1, §2.2, §3.1, §3.1, 2nd item.
 [34] (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1 (5), pp. 206–215. Cited by: §2.1.
 [35] (2019) CXPlain: causal explanations for model interpretation under uncertainty. In Advances in Neural Information Processing Systems, pp. 10220–10230. Cited by: §A.9, 1st item, §6.2.
 [36] (1953) A value for nperson games. Contributions to the Theory of Games 2 (28), pp. 307–317. Cited by: §3.2, 5th item.
 [37] (2016) Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713. Cited by: §2.2.
 [38] (2017) Amortized optimization. External Links: Link Cited by: 6th item.
 [39] (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Cited by: §2.1, §2.1.
 [40] (2017) Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825. Cited by: §2.1.
 [41] (2016) Shapley effects for global sensitivity analysis: theory and computation. SIAM/ASA Journal on Uncertainty Quantification 4 (1), pp. 1060–1083. Cited by: 1st item.
 [42] (2008) Conditional variable importance for random forests. BMC Bioinformatics 9 (1), pp. 307. Cited by: §1, 7th item.

[43]
(2009)
Explaining instance classifications with interactions of subsets of feature values.
Data & Knowledge Engineering
68 (10), pp. 886–904. Cited by: §A.11, §1, §1, §2.2, 13rd item, 2nd item.  [44] (2010) An efficient explanation of individual classifications using game theory. Journal of Machine Learning Research 11, pp. 1–18. Cited by: §A.11, §A.11, §3.2, 10th item, 1st item.
 [45] (2019) The many shapley values for model explanation. arXiv preprint arXiv:1908.08474. Cited by: §2.2.
 [46] (2017) Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 3319–3328. Cited by: §1, §2.1, §2.1, §2.1, §2.2.
 [47] (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58 (1), pp. 267–288. Cited by: 3rd item.

[48]
(2020)
Attribution in scale and space.
In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp. 9680–9689. Cited by: §2.1.  [49] (2018) INVASE: instancewise variable selection using neural networks. In International Conference on Learning Representations, Cited by: §A.7, §3.2, 3rd item, §6.2.

[50]
(2018)
Generative image inpainting with contextual attention
. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5505–5514. Cited by: §A.3, 6th item.  [51] (2014) Visualizing and understanding convolutional networks. In European Conference on Computer Vision, pp. 818–833. Cited by: §1, §1, §2.1, §2.1, §3.2, 1st item.
 [52] (2014) Object detectors emerge in deep scene cnns. arXiv preprint arXiv:1412.6856. Cited by: §A.4, §2.1, §3.2, 4th item.

[53]
(2016)
Learning deep features for discriminative localization
. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929. Cited by: §2.1.  [54] (2017) Visualizing deep neural network decisions: prediction difference analysis. arXiv preprint arXiv:1702.04595. Cited by: §A.8, 7th item.
Comments
There are no comments yet.