1 Introduction
Deep Neural Networks (DNNs) have achieved great success in many realworld applications. The tremendous parameter space enhances the function approximation ability and hence improves the model performance. However, they also compromises over the model transparency, making it difficult to interpret the model decision. The concerns about the interpretability of DNNs have hampered their further applications, especially in highstake applications such as automatic driving and AI healthcare. Hence developing model interpretation to promote a trustworthy AI is extremely important and has drawn an increasing attention recently[1].
Attribution methods have become an effective computational tool in understanding the behavior of machine learning models, especially Deep Neural Networks (DNNs) [1, 2, 3]. They uncover how machine learning models make a decision by calculating the contribution score of each input feature to the final decision. For example, in image classification, the attribution methods infer the contribution of each pixel to the predicted label for a pretrained model, and usually create heatmaps to visualize the contributions.
Although several attribution methods[4] have been proposed recently, the attribution problem is actually not welldefined. Sundararajan[5] roughly defines ”the attribution of input relative to a baseline point
” as a vector
, where denotes the contribution of featureto the prediction. Such description is uninformative to understanding attribution problem, which lacks a concrete guide to the logic of contribution assignment process. Moreover, existing attribution methods are based on different heuristics and have very limited theoretical understanding and support. For instance,, Layerwise Relevance Propagation (LRP) evaluates the contribution of each neuron at the lower layer to a nonlinear neuron at the upper layer based on the proportion of lower neuron’s linear combination value
[6]. In addition, the saliency map of smooth gradients [7] obtains significantly improved performance than gradient just by averaging the gradients of neighbors. The rationales behind these methods are perplexing.Hence, it’s highly desirable to not only deepen the understanding of attribution problem, but also conduct a comprehensive exploration and investigation to those various heuristic attribution methods. Specifically, the following important questions for attribution methods need theoretical investigation: Rationale–What model behaviors do these attribution methods actually reveal; Fidelity–How much can decision making process be attributed in these attribution methods; Limitations–Where these attribution methods may fail.
While some attempts have been made to partially answer the questions by unifying several attribution methods as additive feature attribution [3], multiplying a modified gradient with input [8], or firstorder Taylor expansion [4], the problems are still not addressed well due to two challenges. The first challenge (Ch1) is to our knowledge, none of them could offer a good description to the attribution problem. The second challenge (Ch2) stems from the fact that it is very difficult to propose a general framework to unify most existing attribution methods, because these methods are based on various heuristics.
In this paper, we address the aforementioned problems by proposing a general Taylor attribution framework, which not only offers a good description to the attribution problem (section 3), but also unifies fourteen mainstream attribution methods into the framework by Taylor reformulations (section 4). The basic idea behind the proposed framework is to attribute an Taylor approximation function of DNNs, instead of DNNs themselves. The power of this framework is built on three foundations: (1) It’s based on Taylor expansion, hence it’s able to sufficiently approximate the behavior of blackbox DNNs, with a theoretical guarantee on approximation error; (2) The Taylor expansion function is polynomial, in which attribution analysis is easier and intuitive. Hence the attribution of input relative to a baseline point can be modeled as how to decide individual payoffs in a coalition, as shown in Figure 1. Specifically, attribution problem can be considered as studying how to assign contributions from Taylor independent terms (e.g., ) and Taylor interactive terms from a feature coalition, such as . We will elaborate on this in section 3. (3) Taylor expansion is a natural tool to decompose the output difference between the input and baseline point, which could decompose the difference into the sum of input features’ effects. On the other hand, most attribution methods usually analyze or decompose the output difference and can be represented as a function of output difference. Therefore it’s feasible to reformulate the attributions from these methods into the Taylor attribution framework.
According to the unified Taylor reformulations, we reveal rationales, measure fidelity, and point out limitations for the existing attribution methods in a systematic and theoretical way. During the analysis, we categorize existing attribution methods into four types according to three factors: i) whether they consider feature interactions; ii) single baseline or multiple baseline; iii) whether positive and negative effects are separated. Moreover, we establish and advocate three principles for a good Taylor attribution, which are low approximation error, correct Taylor contribution assignment, and unbiased baseline selection.
Finally, we empirically validate the proposed Taylor reformulations by comparing the attribution results obtained by the original attribution methods and their Taylor reformulations. The experimental results on MNIST show the two attribution results are almost consistent. We also reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on MNIST and Imagenet. In summary, this paper has three main contributions:

[leftmargin=*]

We propose a general Taylor attribution framework, which offers a good description to the attribution problem. The framework provides an insight into the logic of contribution assignment process.

Fourteen mainstream attribution methods are unified into the proposed framework by theoretical reformulations.

Based on unified Taylor reformulations, we revisit existing attribution methods in terms of their rationale, fidelity, and limitations. We also accordingly establish three principles for a good attribution.

We empirically validate the Taylor reformulations, and reveal the relationship between attribution performance and the three principles on MNIST and Imagenet.
2 Related work
In this section, we firstly provide an overview of existing attribution methods. Then, we introduce the related works which pay attention to understanding and unifying these existing attribution methods in details.
2.1 Existing attribution methods
Attribution is an effective computational tool in locally interpreting the behavior of machine learning models, especially DNNs. Recently, there are a number of attribution methods developed to infer the contribution score of each input feature to the final prediction for a given input sample. Saliency maps are usually created to visualize the contribution score. We roughly categorize these attribution methods into local attribution explanation approach and global attribution explanation approach.
Local attribution explanation approach focuses on the sensitivity of the difference of the output neuron w.r.t each input neuron, i.e., how the output of the network changes for infinitesimally small perturbations around the original input. Gradient[9]
calculates the sensitivity, which masks out the negative neurons of bottom data via the forward Relu at the Relu layer. To improve the saliency map quality, smooth gradients
[7] produces an attribution vector by averaging the gradients of neighbor samples, which is generated through adding Gaussian noise to the original given sample. Deconvnet [10] aims to map the output neuron back to the input pixel space. To keep the neuron’s size and nonnegative property, they resort to the transposed filters and backward Relu, which masks out the negative neurons of the top gradients. The Guided Backpropagation (GBP) [11] combines Gradients and Deconvnet, considering both forward relu and backward relu. As a result, GBP could significantly improve the visual quality of visualizations.Global attribution explanation approach directly analyzes or decomposes the output difference between the input and the selected baseline. Gradient*Input [12] calculates the attributions by multiplying the gradient with the original input, to improve the sharpness of saliency maps. GradCAM focuses on interpreting the classification module of convolutional neuron networks (CNNs). It captures the importance of each feature channel at the top convolutional layer, which conducts global average pooling to the gradients of the output neuron w.r.t each feature map. Then GradCAM obtain a coarse attribution by multiplying the importance with these feature maps. Occlusion1 [10] and Occlusionpatch [13] observes the changes of the output induced by occluding each input pixel or patch. Layerwise Relevance Propagation rule (LRP) decomposes the value of output neuron in a layerwise manner. Specifically, it recursively decomposes the relevance score of a neuron at the upper layer to the neurons at the lower layer, according to the corresponding proportion in the linear combination. DeepLIFT Rescale rule [14] adopts a similar linear rule to LRP, while it assigns the difference between the output and a baseline output in terms of the difference between the input and a preset baseline input, instead of merely considering the output value. Integrated Gradients [5]
, corresponds to AumannShapley, decomposes the output difference by integrating the gradients along a straight path interpolating from input sample to the baseline. In addition, Shapley value
[3] has become a popular attribution method, which calculates an average marginal contribution of each feature across all possible feature subsets. Shapley value is characterized by a collection of desirable properties, e.g., local accuracy, missingness, and consistency.Moreover, some variants of above global explanation methods have been proposed recently. Generally, they adopts two strategies to improve the attribution results: i) disentangling the contributions from positive and negative terms. For example, DeepLIFT RevealCancel [14] separately considers the overall marginal impact of the positive terms and negative terms. Layerwise relevance propagation rule (LRP) [6] decompose times the overall effects to the positive terms and times to the negative terms, where and satisfy to ensure the completeness. Deep Taylor [15] has been demonstrated as a special case of LRP when and
. ii) averaging over multiple baselines to reduces the probability that the attribution is dominated by a specific baseline. Such strategy can be integrated into most attribution methods. The corresponding version of Integrated Gradients (Expected Gradients), DeepLIFT (Expected DeepLIFT), and Shapley value (Deep Shapley) have been shown significantly improve the interpretation performance.
In this paper, we mainly focus on the global attribution explanation approach, because they usually analyze or decompose the output difference between input and baseline and can be represented as a function of such difference. Therefore it’s natural to reformulate these methods into the proposed Taylor attribution framework.
2.2 Understanding and unifying attribution methods
There are a few works on understanding the theoretical groundings of some attribution methods that are often designed heuristically. LRP [16] and LRP [4] are reformulated as a firstorder Taylor decomposition. Moreover, Deconvnet and Guided BP have been theoretically proved [17] that they are essentially constructing (partial) recovery to the input, which is unrelated to decision making.
Some efforts have been devoted to unifying existing attribution methods. LIME, LRP, DeepLIFT, and Shapley value are unified under the framework of additive feature attribution [3]. Some gradientbased attribution methods including Gradient*Input, LRP, DeepLIFT and Integrated Gradient, are unified as multiplying a modified gradient with input [8]. Several equivalence conditions are given. In addition, several methods are summarized as firstorder Taylor decomposition on different baseline points [4].
To our knowledge, this is the first work to leverage highorder Taylor decomposition and interactive effects to formally define the attribution problem and unify the majority of existing attribution methods.
Symbol  Description 
Feature set  
Feature subset  
Input point  
Baseline point  
Input difference, defined as  
Input difference of feature  
DNN model  
The prediction of input  
order Taylor expansion function of  
Taylor approximation error of at point  
Hessian matrix at  
Hessian independent and interactive matrix  
Taylor first, second, and highorder terms  
Taylor second, highorder independent terms  
Taylor second, highorder interactive terms  
Interactive terms among features in  
Interactive terms between features in and  
Attribution vector  
Attribution of feature  
Attribution of feature from  
Attribution of feature from 
3 Taylor Attribution Framework
In this section, we propose a Taylor attribution framework to deepen the intrinsic understanding of attribution problem. Given a pretrained DNN model and an input sample , the attribution problem aims to infer the contribution of each feature to the prediction . Existing attribution methods usually select a baseline point to represent a reference state, then the output difference between the input and baseline point , can be considered as the influence caused by the input difference . Hence, attribution can be seen as the process assigning the output difference to each feature according to their corresponding input difference . However, there are infinite possible cases to decompose a scalar to a dimensional vector. No work has provided a guidance to the concrete logic in the contribution assignment process, i.e., which assignment is logical and reasonable.
To offer a good description and depiction to the attribution problem, we resort to Taylor decomposition theory and propose a general Taylor attribution framework. Specifically, the basic idea is that we conduct the attribution for the Taylor approximation function of DNN model, instead of directly attributing DNN itself. The idea is doable due to two aspects. Firstly, Taylor expansion can sufficiently approximate DNN model so that the above two attributions are approximately equivalent. Secondly, the Taylor expansion function is polynomial, which is easier to analyze how to assign the contribution intuitively.
Assume is differentiable, the Taylor expansion of expanded at input sample is^{1}^{1}1Noted that although the deep relu network is not differentiable such that Taylor expansion is not applicable, networks with softplus activation (approximation of relu) can be used to provide an insight to the rationale behind relu net.,
where is the order Taylor expansion function of . is the function value of at baseline , and is the approximation error between and at point . The left side of equation, , represents the output difference, which can be considered as the influence of input difference . We need to answer: how to decompose the effect according to the input difference of each feature. It’s difficult to decompose directly the effect due to the complexity of . As is an approximation of the output difference, we instead decompose into a attribution vector , where denotes the attribution score of feature to .
An overview of Taylor attribution framework is illustrated in Figure 2. For convenience, we summary the descriptions of main symbols in this paper in Table I.
3.1 Firstorder Taylor attribution
The firstorder Taylor expansion function is
where denotes the derivative of with respect to . The linear approximation function in firstorder Taylor expansion, , is additive across features and can be easily decomposed. It is obvious that quantifies the contribution of feature , i.e.,
.
3.2 Secondorder Taylor attribution
The secondorder Taylor expansion has a smaller approximation error than the firstorder one, so that it is expected more faithful to the model . The secondorder Taylor expansion function is given by
where is the Hessian matrix, i.e., secondorder partial derivative matrix, of at . We denote the firstorder and secondorder Taylor terms as and , respectively.
The secondorder Taylor expansion function is indistinct in determining feature contributions compared with firstorder one due to the Hessian matrix. To make the attribution more clear, we decompose into two matrices, an independent matrix and an interactive matrix . Here is a diagonal matrix composed of the diagonal elements in , which describes the secondorder isolated effect of features, and represents the interactive effect between features. could be rewritten as the sum of first order terms , secondorder independent terms , and secondorder interactive terms ,
Accordingly, the attribution of to should be
,
where and represent the assigned contributions from , and , respectively. The contributions from independent terms and can be clearly identified as
,
where and denote the firstorder terms and secondorder independent terms of feature , respectively.
The difficulty lies on how to assign the contribution from interactive terms . We propose to handle it by following an intuition behind: the attribution is from and should be the sum of assignments from each interactive effect involving feature ,
,
where denotes the secondorder interactive terms corresponding to feature and , weight characterizes the assignment of the interactive terms to , and is the attribution from .
The determination of the assignment weight is complicated and depends on specific case. However, it’s considered that the interactive terms of two features should be only attributed to these two features. Consider the interactive terms between and , the assignment should satisfy , i.e., . For example, as shown in the Taylor reformulation in section 4, Integrated Gradients assigns the interactive terms according to the order of features. Because the order of and in secondorder interactive terms (i.e., ) are both 1, so the term are equally assigned to and . That is, in Integrated Gradients.
3.3 Highorder Taylor attribution
The analysis on secondorder expansion can be naturally extended to highorder expansion where . Let denote all highorder expansion terms, including secondorder expansion terms. The highorder Taylor expansion function at is
where and denote highorder independent and interactive terms, respectively.
Analogously to the secondorder case, the attribution of feature in highorder expansion is given by
,
where , represent the assigned contributions from and , respectively. The attribution from firstorder term and highorder independent term is clear,
, ,
where represent the highorder independent terms of feature . The attribution from interactive terms, , consists of all assignments from interactive terms involving ,
, ,
where denotes the attribution from interactive terms corresponding to features in the feature subset . Note that the interactive terms should be only assigned to the features in the subset , i.e.,
.
Based on the analysis, we give a definition for how to correctly assign Taylor contribution.
Definition 1.
A Taylor attribution has a correct Taylor contribution assignment if the attribution is given by,
(1) 
and the assignment from interactive terms satisfies,
(2) 
In brief, Eq. 1 indicates that the Taylor firstorder and highorder independent terms of feature should be assigned to , and part of Taylor interactive terms involving feature should be allocated to . Eq. 2 requires that the interactive terms of features in subset A should be and only be attributed to the features in subset A. It’s worthy noting that the highorder term can be omitted, if the firstorder Taylor expansion can approximate the model sufficiently.
3.4 The selection of baseline point
From the Taylor attribution framework, the attribution of feature could be seen as a polynomial function of (i.e., ), and hence it highly depends on . Given a constant vector baseline as many attribution methods did, the attribution of feature whose value is far from may be overestimated due to a large , while the attribution of feature whose value is close to may be underestimated even if it is important to the decision making process. Such different attributions are a bias in many tasks. For example, in image classification, it’s unreasonable to attribute according to the value of features (i.e., pixel values). Specifically, given a black image as baseline, pixels in white color have a large close to , while pixels in black color have a small close to . Correspondingly, the attribution methods will biasedly highlight white pixels while neglecting black pixels even if black pixels make up the object of interest. Hence the selection of baseline point plays a significant role.
Baseline point is used to represent an “absence” of a feature, by which the attribution methods calculate how much the output of the model would decrease considering the absence of the feature [18]. Hence, it’s expected that the output of baseline point has a significant decrease. Moreover, to avoid incorporating aforementioned bias into the attribution process, attribution methods should choose an unbiased baseline which satisfies there is no big differences among of different features. That is, should be similar to for random two feature dimensions. One option is setting as a constant vector , and its corresponding baseline is . Such baselines indeed solve the bias issue, however they usually don’t make a difference to the output of the model. Another alternative is to sample the input difference
from distributions (e.g., uniform and Gaussian distributions) with zero mean and small variance, to ensure a small difference among
of different features. In addition, the biases can be further neutralized by averaging multiple baselines whose s are sampled from such distributions. This strategy reduces the probability that the attribution is dominated by a specific baseline, which is prone to be biased. This may explain why SmoothGrad [7] and Expected Gradients [19] will success with small Gaussian variance level while fail with large variance.4 Unified Taylor Reformulations
The proposed Taylor attribution framework is very general, and it can unify attribution methods based on the analysis of output difference. These attribution methods assign/decompose the output difference between input and baseline point to each input feature. In these attribution methods, the attribution is performed by a function of output difference. Moreover, such output difference can be approximately represented as the sum of Taylor terms by Taylor decomposition. Therefore, the attribution can be unified into our framework, i.e., the attribution could be reformulated as a function of the Taylor terms.
In this section, we will unify fourteen mainstream attribution methods into the proposed Taylor framework by Taylor reformulations, and all the proofs of theorems are in the Appendix. This section is organized as follows. Firstly, we discuss about eight basic versions of attribution methods, which are Gradient*Input [12], GradCAM [20], Occlusion1 [10], Occlusionpatch [13], Integrated Gradients [5], DeepLIFT Rescale [14], LRP [6] and Shapley value [3]. According to whether the method considers feature interactions, we categorize them into two types. Secondly, we study the variants which disentangle the contributions from positive and negative terms. Specifically, this part includes DeepLIFT Rescale (DeepLIFT RevealCancel [14]) and LRP (LRP [6] and Deep Taylor [15]). Thirdly, we reformulate the variants averaging over multiple baselines, which are Expected Gradients [19], Expected DeepLIFT, and Deep Shapley [3].
4.1 Without feature interaction
In this subsection, we demonstrate that after Taylor reformulations, the following five attribution methods don’t consider feature interactive terms (completely).
4.1.1 Gradient*Input
The attribution in Gradient*Input [12] is calculated by multiplying the partial derivatives (of output w.r.t input) with the input, i.e., . It’s easy to obtain Theorem 1.
Theorem 1.
Gradient*Input can be reformulated as a firstorder Taylor attribution w.r.t the baseline point ,
4.1.2 Lrp
LRP [6] proceeds in a layerwise backpropagation fashion. Use and to denote the neuron at th layer and the neuron at th layer, respectively, and . Here is the weight parameter, is the additive bias, and
is a nonlinear activation function.
LRP recursively decomposes the relevance score of th neuron at layer to the neurons at th layer, according to the proportion of weighted impacts. Then the attribution of th neuron from th neuron at layer is,(3) 
where = is the weighted impact of to . is the total relevance score of neuron and will be assigned to features at th layer. Here is a small quantity added to the denominator to avoid numerical instabilities.
Theorem 2.
When LRP is applied to a network with Relu activation, the attribution of is equivalent to the attribution in Gradient*Input, i.e.,
Categorization  Methods  Taylor Reformulations 
Without Interaction  GI  
LRP  
GCAM  
Occ1  
Occp  
With Interaction  Integrated  
DeepLIFT  
Shapley  
Separating + &   DeepLIFT+  
Deep Taylor  
LRP  ) )  
Expected Attribution  EIntegrated  
EDeepLIFT  
DShap 
4.1.3 GradCAM
GradCAM [20] focuses on interpreting the classification module of CNNs and takes the feature maps of the top convolutional layer to calculate the attribution scores. Here is the number of channels, and
represent the weight and height of these feature maps. Specifically, GradCAM firstly captures the importance of each feature map by conducting global average pooling (GAP) operation to the gradient of the target output neuron
w.r.t the feature map. For th feature map, the importance is calculated bywhere is the intermediate feature at location at th feature map. Then GradCAM can approximately decompose as a weighted combination between the importance weights and these feature maps, i.e.,
(4) 
We obtain that GradCAM assigns contribution to the th feature map, where is expressed as:
(5) 
To investigate the contribution of the features at different locations, the right side of the Equation 4 can be rewritten as . Then GradCAM can correspondingly assign contribution to the feature at location, where is expressed as:
Define to be the th GAP feature map, and the model can be expressed as a function of GAP feature , i.e., . Then we have Theorem 3.
Theorem 3.
The Eq. 5 in GradCAM can be reformulated as a firstorder Taylor attribution of function w.r.t the baseline point ,
Specifically, in GradCAM, the attribution of th feature map (Eq. 5) is reformulated as .
4.1.4 Occlusion1
Occlusion1 [10] calculates how much the prediction changes induced by occluding feature with a zero baseline. The new occluded input is written as . Then the attribution of feature is defined as the difference of the output, .
Theorem 4.
The attribution of in Occlusion1 can be reformulated as the sum of firstorder and highorder independent terms of at baseline point ,
The attribution of in Occlusion1 is in the secondorder Taylor attribution.
4.1.5 Occlusionpatch
The attribution in Occlusionpatch [13] is similar to Occlusion1 but conducted on a patch level. It constructs a a zero patch baseline by occluding an image patch , and defines the output difference as the attribution of features in .
Theorem 5.
The attribution of in Occlusionpatch can be reformulated as the sum of firstorder, highorder independent terms of features in patch , and all highorder interactive terms involving the features in patch ,
Particularly, is in secondorder setting.
4.2 With feature interaction
In this subsection, we study four attribution methods considering feature interactions.
4.2.1 Integrated Gradients
The attribution in Integrated Gradients [5] integrates the gradients along the straight line path from a baseline point to an input . The points along the path are denoted as = . The attribution of feature is computed by
(6) 
Theorem 6.
The attribution of in Integrated Gradients can be reformulated as the sum of firstorder term of , highorder independent terms of , and an assignment from highorder interactive terms involving at baseline ,
where = is the assignment, and = is the Taylor expansion coefficient of .
In brief, Integrated Gradients allocates proportion of the highorder interactive term to .
Remark 1.
Give a concrete example. Assume , and let and . Obviously, the independent terms , , and should be clearly assigned to , and respectively. With respect to the interactive terms, the assignment of Integrated Gradients is based on the order of features, which allocates proportion of term to feature . Hence, it assigns to feature , assigns to feature , and assigns to feature .
4.2.2 DeepLIFT Rescale
Similar to LRP, DeepLIFT Rescale [14] also computes the relevance scores by relevance propagation in a layerwise manner. While instead of merely considering the output value, DeepLIFT propagates the output difference between the input and the baseline to the input layer. Specifically, at th layer, it calculates the relevance score of to , denoted as , by
(7) 
where = is the weighted impact of to , analogously = denotes the weighted impact of the baseline, and = denotes the total relevance score of .
Theorem 7.
The attribution of in DeepLIFT Rescale at th layer is equivalent to the attribution in Integrated Gradients.
(8) 
Hence DeepLIFT Rescale can be considered as a layerwise Integrated Gradients.
4.2.3 Shapley value
is a classical solution concept in cooperative game theory, which aims to assign an importance score to each player (feature) in a cooperative game (model) involving the coalition of
players (features). According to Shapley value, given a cooperative game , the amount that player contributes is,(9) 
Here is the set of all players, and traverses all subsets of . Eq. 9 can be interpreted as averaging the marginal contribution of to coalition over all possible coalitions (involving ). When applied to interpreting DNNs, is often obtained by calculating the output when setting the values of complementary set as .
Theorem 8.
The attribution of in Shapley value can be reformulated as the sum of firstorder term of , highorder independent terms of , and proportion of the interactive terms among features in .
In other words, Shapley value averagely assigns the interactive terms among features in the set , i.e., the proportion of each feature is .
Remark 2.
For example, let . The interactive terms involving two features are and , respectively. The terms involving three features are . Hence, Shapley value assigns to feature , and assigns to feature .
4.3 Separating positive and negative contributions
Shrikumar [14] has shown that the positive and negative impacts may cancel out during the attribution process and hence may provide misleading interpretations. To alleviate such issues, some variants including DeepLIFT RevealCancel [14], Deep Taylor [15], and LRP [6], have been proposed to treat the positive and negative impacts separately.
The basic idea is to firstly decompose the output difference into positive components arising from positive input differences and negative components arising from negative input differences. Then is allocated to the neurons with positive input differences, and proceeds analogously. We give a unified formulation for the three attribution methods.
Similar to DeepLIFT Rescale, the three attribution methods also proceed in a recursively backpropagation manner. Hence we adopt the same set of symbols as in DeepLIFT Rescale. We focus on the propagation from the target neuron at layer to the input neurons at th layer. For sake of simplicity, we omit the superscript of layer index (i.e., and ) and subscript of target neuron index (i.e., ). In addition, we rewrite all input neurons as , and denote the target output neuron as . We represent the corresponding weighted impacts as . Hence we have . These three methods firstly decomposes the input differences into positive and negative parts, i.e.,
where and . Specifically, for each input neuron , we define . Moreover, we denote and as the feature subset with positive and the feature subset with negative , respectively.
These attribution methods decompose the output difference into the positive component and the negative component , which satisfies . For the features in , the positive attribution of feature from th neuron at th layer is obtained by,
(10)  
Similarly, for the features in , the negative attribution of from