Improving Deep Learning Interpretability by Saliency Guided Training

11/29/2021
by   Aya Abdelsalam Ismail, et al.
University of Maryland
20

Saliency methods have been widely used to highlight important input features in model predictions. Most existing methods use backpropagation on a modified gradient function to generate saliency maps. Thus, noisy gradients can result in unfaithful feature attributions. In this paper, we tackle this issue and introduce a saliency guided trainingprocedure for neural networks to reduce noisy gradients used in predictions while retaining the predictive performance of the model. Our saliency guided training procedure iteratively masks features with small and potentially noisy gradients while maximizing the similarity of model outputs for both masked and unmasked inputs. We apply the saliency guided training procedure to various synthetic and real data sets from computer vision, natural language processing, and time series across diverse neural architectures, including Recurrent Neural Networks, Convolutional Networks, and Transformers. Through qualitative and quantitative evaluations, we show that saliency guided training procedure significantly improves model interpretability across various domains while preserving its predictive performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 8

page 16

page 17

page 18

page 20

page 21

page 22

10/26/2020

Benchmarking Deep Learning Interpretability in Time Series Predictions

Saliency methods are used extensively to highlight the importance of inp...
02/13/2019

Why are Saliency Maps Noisy? Cause of and Solution to Noisy Saliency Maps

Saliency Map, the gradient of the score function with respect to the inp...
05/02/2019

Full-Jacobian Representation of Neural Networks

Non-linear functions such as neural networks can be locally approximated...
12/01/2020

Rethinking Positive Aggregation and Propagation of Gradients in Gradient-based Saliency Methods

Saliency methods interpret the prediction of a neural network by showing...
06/23/2021

Gradient-Based Interpretability Methods and Binarized Neural Networks

Binarized Neural Networks (BNNs) have the potential to revolutionize the...
04/23/2020

Evaluating Adversarial Robustness for Deep Neural Network Interpretability using fMRI Decoding

While deep neural networks (DNNs) are being increasingly used to make pr...
10/27/2019

Input-Cell Attention Reduces Vanishing Saliency of Recurrent Neural Networks

Recent efforts to improve the interpretability of deep neural networks u...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNNs) have been widely used in a variety of different tasks lecun2015deep; krizhevsky2012imagenet; rich2016machine; obermeyer2016predicting; yet interpreting complex networks remains a challenge. Reliable explanations are necessary for critical domains like medicine, neuroscience, finance, and autonomous driving caruana2015intelligible; lipton2018mythos. Explanations are also useful for model debugging zeiler2014visualizing; lou2012intelligible. As a result, various interpretability methods were developed to understand DNNs LRP; Simonyan2013DeepIC; kindermans2016investigating; sundararajan2017axiomatic; smilkov2017smoothgrad; levine2019certifiably; singla2019understanding. A common approach for understanding model decisions is to identify features in the input that highly influenced the final classification decision baehrens2010explain; zeiler2014visualizing; sundararajan2017axiomatic; smilkov2017smoothgrad; shrikumar2017learning; lundberg2017unified; zhou2016learning. Such approaches, known as saliency maps, often use gradient calculations to assign an importance score to individual features, reflecting their influences on the model prediction.

Saliency methods aim to highlight meaningful input features in model predictions to humans; however, the maps produced are often noisy (i.e., contain visual noise). To improve the faithfulness of saliency maps, explanations methods that depend on more than one or higher-order gradient calculations were developed. For example, SmoothGrad smilkov2017smoothgrad reduces saliency noise by adding noise to the input multiple times and then taking the average of the resulting saliency maps for each input. Integrated gradients sundararajan2017axiomatic, DeepLIFT shrikumar2017learning and Layer-wise Relevance Propagation LRP backpropagate through a modified gradient function ancona2017towards while singla2019understanding studies the use of higher-order gradients in saliency maps.

In this paper, we take a different approach to improve the interpretability of deep neural networks— instead of developing yet another saliency method, we propose a new training procedure that naturally leads to improved model explanations using current saliency methods. Our proposed training procedure, called saliency guided training

, trains models that produce sparse, meaningful, and less noisy gradients without degrading model performance. This is done by iteratively masking input features with low gradient values (i.e., less important features) and then minimizing a loss function that combines (a) the KL divergence

kullback1951information between model outputs from the original and masked inputs, and (b) the appropriate loss function for the model prediction. This procedure reduces noise in model gradients without sacrificing its predictive performance.

To demonstrate the effectiveness of our proposed saliency guided training approach, we consider a variety of classification tasks for images, language, and multivariate time series across diverse neural architectures, including Convolutional Neural Networks (CNNs), Recurrent Neural Network (RNNs), and Transformers. In particular, we observe that using saliency guided training in image classification tasks leads to a reduction in visual saliency noise and sparser saliency maps, as shown in Figure

2

. Saliency guided training also improves the comprehensiveness of the produced explanations for sentiment analysis, and fact extraction tasks as shown in Table

1.

In multivariate time series classification tasks, we observe an increase in the precision and recall of saliency maps when applying the proposed saliency guided training. Interestingly, we also find that the saliency guided training reduces the vanishing saliency issue of RNNs

inputCellAttention as shown in Figure 6. Finally, we note that although we use the vanilla gradient for masking in the saliency guided training procedure, we observe significant improvements in the explanations produced after training by several other gradient-based saliency methods.

2 Background and Related Work

Interpretability is a rapidly growing area with several diverse lines of research. One strand of interpretability considers post-hoc explanation methods, aiming to explain why a trained model made a specific prediction for a given input. Post-hoc explanation methods can be divided into gradient-based methods baehrens2010explain; sundararajan2017axiomatic; smilkov2017smoothgrad; shrikumar2017learning; lundberg2017unified; selvaraju2017grad that can be reformulated as computing backpropagation for a modified gradient function and perturbation-based approaches zeiler2014visualizing; suresh2017clinical; ribeiro2016should; tonekaboni2020went that perturb areas of the input and measure how much this changes the model output. perkins2003grafting

uses gradients for feature selection through Grafting. Another line of works aims to measure the reliability of interpretability methods. This can be done by creating standardize benchmarks with interpretability metrics

hooker2019benchmark; ismail2020benchmarking; deyoung2019eraser; harbornesanity; samek2016evaluating; petsiuk2018rise or debugging explanations adebayo2018sanity; kindermans2019reliability; ghorbani2019interpretation; adebayo2020debugging by identifying test cases where explanations fail. Others ba2014deep; frosst2017distilling; ross2017right; wu2018beyond; inputCellAttention focus on modifying neural architectures for better interpretability. Similar to our line of work, ghaeini2019saliency and ross2017right incorporate explanations into the learning process. However, ghaeini2019saliency relies on the existence of the ground truth explanations while ross2017right relies on the availability of annotations about incorrect explanations for a particular input. Our proposed learning approach does not rely on such annotations; since most datasets only have ground truth labels, it may not be practical to assume the availability of positive or negative explanations.

Input level perturbation during training has been previously explored. li2018tell; hou2018self; wei2017object; singh2017hide use attention maps to improve segmentation for weakly supervised localization. wang2019sharpen incorporates attention maps into training to improve classification accuracy. devries2017improved masks out square regions of input during training as a regularization technique to improve the robustness and overall performance of convolutional neural networks. Our work focuses on a different task which is increasing model interpretability through training in a self-supervised manner.

In this paper, we evaluate our learning procedure with the following saliency methods: Gradient (GRAD) baehrens2010explain is the gradient of the output w.r.t the input. Integrated Gradients (IG) sundararajan2017axiomatic calculates a path integral of the model gradient to the input from a non-informative reference point. DeepLIFT (DL) shrikumar2017learning

compares the activation of each neuron to a reference activation; the relevance is the difference between the two activations.

SmoothGrad (SG) smilkov2017smoothgrad samples similar input by adding noise to the input and then takes the average of the resulting sensitivity maps for each sample. Gradient SHAP (GS) lundberg2017unified adds noise to the input, then selects a point along the path between a reference point and input, and computes the gradient of outputs w.r.t those points.

We demonstrate the effectiveness of our training procedure using several neural network architectures: Convolution neural networks (CNNs) including VGG-16 simonyan2014very, ResNet he2016deep and Temporal Convolutional Network (TCN) oord2016wavenet; TCN2017; bai2018empirical, a CNN that handles sequences; Recurrent neural networks (RNNs) including LSTM hochreiter1997long and LSTM with Input-Cell Attention inputCellAttention; as well as Transformers vaswani2017attention.

3 Notation

First, consider a classification problem on the input data such that each has features and is the label. Let denote a neural network parameterized by . The standard training of the network involves minimizing the cross-entropy loss over the training set as follows:

(1)

The gradient of the network output with respect to the input is given by . Let be a sorting function such that is the smallest element in . Hence, is the sorted gradient. We define the input mask function such that replaces all where with a mask distribution, i.e., removes the lowest features from based on the order provided by .

For a language input, we use where is the feature embedding representing the word of the input. In that case, would sort elements of based on the sum of the gradient of the embeddings for each word and would mask the bottom words according to that sorting. For a multivariate time series input, we use where is the number of time steps and is the number of features per time step. is the input feature at time ; sorting and masking would be done at the level.

For two discrete probability distributions

and

defined on the same probability space

, the Kullback–Leibler (KL) divergence kullback1951information (or, relative entropy) from to is given as :

(2)

4 Saliency Guided Training

Existing gradient-based methods can produce noisy saliency maps as shown in Figure 1. The saliency map noise may be partially due to some uninformative local variations in partial derivatives. Using a standard training procedure based on ERM (expectation risk minimization), the gradient of the model w.r.t. the input (i.e., ) may fluctuate sharply via small input perturbations smilkov2017smoothgrad.

Figure 1: Saliency maps produced by typical training versus saliency guided training.

If gradient-based explanation methods faithfully interpret the model’s predictions, irrelevant features should have gradient values close to zero. Building on this intuition, we introduce saliency guided training, a procedure to train neural networks such that input gradients computed from trained models provide more faithful measures to downstream (gradient-based) saliency methods. Saliency guided training aims to reduce gradient values of irrelevant features without sacrificing the model performance. During saliency guided training, for every input , we create a new input by masking the features with low gradient values as follows:

(3)

is then passed through the network which results in an output . In addition to the classification loss, the saliency guided training minimizes the KL divergence between and to ensure that the trained model produces similar output probability distributions over labels for both masked and unmasked inputs. The optimization problem for the saliency guided training is:

(4)

where

is a hyperparameter to balance between the cross-entropy classification loss and the KL divergence term. Since this loss function is differentiable with respect to

, it can be optimized using existing gradient-based optimization methods. The KL divergence term encourages the model to produce similar outputs for the original input and masked input . For this to happen, the model will need to learn to assign low gradient values to irrelevant features in model predictions. This potentially results in sparse and more faithful gradients as shown in Figure 1.

Masking functions: In images and time series data, features with low gradients are replaced with random values within the feature range. In language tasks, the masking function replaces the low salient word with the previous high salient word. This allows us to emphasize on high salient words and remove non-salient ones while maintaining the sentence length. The selection of

is dataset-dependent. It depends on the amount of irrelevant information in a training sample. For example, since most pixels in MNIST are uninformative, a larger

is desired. Detailed hyperparameters used is available in the appendix. Note that, only input features are masked during the saliency guided training.

Limitations:

(a) Compared to traditional training, our proposed training procedure is more computationally expensive. Specifically, the memory needed is doubled since now in addition to storing the batch, we are storing the masked batch as well. Similar to adversarial training, this training process is slow and takes a larger number of epochs to converge. For example, the standard training of a CIFAR-10 model usually takes on average 118 epochs to converge where each epoch is roughly 24 seconds. Using the saliency guided training, the convergence takes about 124 epochs where each epoch takes roughly 75 seconds (all experiments on the same GPU). (b) Our training procedure requires two hyperparameters

and which might require a hyperparameter search (we find that works well in all of our experiments).

1.5 Given: Training samples , # of features to be masked , learning rate , hyperparameter
Initialize  for  to epochs  do
       for  minibatch  do
             Compute the masked input:
             Get sorted index for the gradient of output with respect to the input.
            
             Mask bottom features of the original input.
            
             Compute the loss function:
            
             Use the gradient to update network parameters:
            
       end for
      
end for
Algorithm 1 Saliency Guided Training

5 Experiments

All experiments have been repeated 5 times; the results reported below are the average of the 5 runs. Hyperparameters used for each experiment, along with the standard error bars and details on computational resources are available in supplementary materials.

5.1 Saliency Guided Training for Images

In the following section, we compare gradient-based explanations produced by regular training versus saliency guided training for MNIST lecun2010mnist trained on a simple CNN lecun1998gradient, for CIFAR10 krizhevsky2009learning trained on ResNet18 he2016deep and for BIRD gerry_2021 trained on VGG-16 simonyan2014very. Further details about the datasets and models are available in the supplementary material.

Figure 2:

(A) Comparison between different training methods on MNIST along with distributions of gradient values in each sample. (B) Saliency maps for CIFAR10 and BIRD datasets using regular and saliency guided training. (C) Distribution of gradient means across examples. Maps produced by saliency guided training are more precise: most features have gradient values around zero with large gaps between mean and outliers. Here gradients around zero indicate uninformative features, while very large and very small gradients indicate informative features. Saliency guided training helps reduce noisy fluctuating gradients in between as shown in the box plots.

Quality of Saliency Maps for Images

For an image classification problem, in many cases, most features are redundant and not needed by the model to make the prediction. Consider the background of an object in an image; although it covers most of the image, backgrounds are often not essential in the classification task. If the model is focusing on the object rather than the background, we would want the background gradient (i.e., most of the features) to be close to zero.

The examples shown in Figure 2

were correctly classified by both models. Gradients are scaled per sample to have values between -1 and 1. In Figure

2 (A) and Figure 2 (B), saliency maps produced by a model trained with saliency guided training were more precise than that trained traditionally. Most saliency maps produced by saliency guided training highlight the object itself rather than the background across different datasets. The distributions of gradient values per sample in Figure 2 (A) show that most features have small gradient values (near zero) with a large separation of high salient features away from zero for the saliency guided training. Similarly, in Figure 2 (C), we find that over the entire dataset, gradient values produced by the saliency guided training tend to be concentrated around zero with a large separation between the mean and outliers (highly salient features), indicating the model’s ability to differentiate between informative and non-informative features.

Figure 3: Model accuracy drop when removing features with high saliency using traditional and the saliency guided training for different gradient-based methods against a random baseline. A steeper drop indicates a better performance. We find that regardless of the saliency method used, the performance improves by the saliency guided training.

Model Accuracy Drop

We compare the saliency guided training and traditional training for different saliency methods with modification-based evaluation samek2016evaluating; petsiuk2018rise; kindermans2017learning: First, features are ranked according to the saliency values. Then, higher-ranked features are recursively eliminated (the original background in MNIST replaces the eliminated features). Finally, the degradation to the trained model accuracy is reported. This is done at different feature percentages. A steeper drop indicates that the removed features affected the model accuracy more. Figure 10 compares the model performance degradation on different gradient-based methods; the saliency guided training shows a steeper accuracy drop regardless of the saliency method used.

Figure 4: Accuracy drop in different modification-based evaluation masking approaches.

This experiment can only be performed on a dataset like MNIST since the uninformative feature distribution is known (black background), while this is not the case in other datasets that we have considered. Although such modification-based evaluation methods have been applied to other datasets, samek2016evaluating; petsiuk2018rise; kindermans2017learning; hooker2019benchmark showed that removing features produces samples from a different data distribution violating the underlying IID assumption (i.e., the training and evaluation data come from identical distributions). When the feature replacement comes from a different distribution, it is unclear whether the degradation in the model performance is from the distribution shift or the removal of informative features. For that reason, we need to make sure that the model is trained on the mask used during testing to avoid this undesired effect.

hooker2019benchmark proposes ROAR where the model is retrained after the feature elimination. However, due to the data redundancy, the retrained model can rely on different features to achieve the same accuracy. Figure 4 shows the model accuracy drop on traditionally trained MNIST when removing the salient features. The IID line represents replacing features with the black MNIST background (known uninformative distribution), which acts as the ground truth in this particular dataset. The OOD line represents replacing the features with the mean image pixel value as done by samek2016evaluating; petsiuk2018rise; kindermans2017learning; and ROAR shows replacing features with the mean value and retraining the model as proposed by hooker2019benchmark. Since neither OOD nor ROAR produce results similar to those produced by the IID feature replacement, we argue that modification-based evaluation methods may provide unreliable results unless the uninformative IID distribution is known. We leave further exploration of modification-based evaluation methods to future work.

5.2 Saliency Guided Training for Language

We compare the interpretability of recurrent models trained on language tasks using the ERASER deyoung2019eraser benchmark. ERASER was designed to capture how well an explanation provided by models aligns with human rationales and how faithful these explanations are (i.e., the degree to which explanation influences the predictions). For our purpose, we only focus on the faithfulness of the explanations.

ERASER provides two metrics to measure interpretability. Comprehensiveness evaluates if all features needed to make a prediction are selected. To calculate an explanation comprehensiveness, a new input is created such that where is predicted rationales (i.e. the words selected by saliency method as informative). Let be the prediction of model for class . The model comprehensiveness is calculated as:

(5)

A high score here implies that the explanation removed was influential in the predictions. The second metric is Sufficiency that evaluates if the extracted explanations contain enough signal to make a prediction. The following equation gives the explanation sufficiency:

(6)

A lower score implies that the explanations are adequate for a model prediction. The comprehensiveness and sufficiency were calculated at different percentages of features (similar to deyoung2019eraser percentages are and ), and Area Over the Perturbation Curve (AOPC) is reported.

We focus on datasets that can be formulated as a classification problem: Movie Reviews: zaidan2008modeling positive/negative sentiment classification for movie reviews. FEVER: thorne2018fever a fact extraction and verification dataset where the goal is verifying claims from textual sources; each claim can either be supported or refuted.

e-SNLI:

camburu2018snli a natural language inference task where sentence pairs are labeled as entailment, contradiction, neutral and, supporting.

Word embeddings are generated from Glove pennington2014glove; then passed to a bidirectional LSTM hochreiter1997long for classification. Table 1 compares the scores produced by different saliency methods for traditional and saliency guided training against random assignment baseline. We found that saliency guided training results in a significant improvement in both comprehensiveness and sufficiently for sentiment analysis task Movie Reviews dataset. While for fact extraction task FEVER dataset, and natural language inference task e-SNLI dataset saliency guided training improves comprehensiveness and there is no obvious improvement in sufficiency (this might be due to the adversarial effect of shrinking the sentence to a much smaller size since the number of words identified as “rationales” is smaller than the remaining words).

Gradient Integrated Gradient SmoothGrad Random
Trad. Sal. Guided Trad. Sal. Guided Trad. Sal. Guided
Movies
Comprehensiveness 0.200 0.240 0.265 0.306 0.198 0.256 0.056
Sufficiency 0.042 0.013 0.054 0.002 0.034 0.008 0.294
FEVER
Comprehensiveness 0.007 0.008 0.008 0.009 0.007 0.008 0.001
Sufficiency 0.012 0.011 0.005 0.004 0.006 0.006 0.003
e-SNLI
Comprehensiveness 0.117 0.126 0.099 0.104 0.117 0.118 0.058
Sufficiency 0.420 0.387 0.461 0.419 0.476 0.455 0.366
Table 1: Eraser benchmark scores: Comprehensiveness and sufficiency are in terms of AOPC. ‘Random’ is a baseline when words are assigned random scores.

5.3 Saliency Guided Training for Time Series

We evaluated saliency guided training for multivariate time series, both quality on multivariate time series MNIST and quantitatively through synthetic data.

Saliency Maps Quality for Multivariate Time Series

We compare the saliency maps produced on MNIST treated as a multivariate time series where one image axis is time. Figure 5 shows the saliency maps produced by different (neural architecture, saliency method) pairs when different training procedures were used. There is a visible improvement in saliency quality across different networks when saliency guided training is used.

Figure 5: Saliency maps produced for (neural architecture, saliency method) pairs. Traditional training was used for networks in the row, while saliency guided training was used for the row. Grad, DL, GS and DLS stand for Gradient, DeepLift, Gradient SHAP and DeepSHAP, respectively. There is an improvement in the quality of saliency maps when saliency guided training is used.

Quantitative Analysis on Synthetic Data

We evaluated the saliency guided training on a multivariate time series benchmark proposed by ismail2020benchmarking. The benchmark consists of 10 synthetic datasets, each examining different design aspects in typical time series datasets. Informative features are highlighted by the addition of a constant to the positive class and subtraction of from the negative class. Following ismail2020benchmarking, we compare 4 neural architectures: LSTM hochreiter1997long, LSTM with Input-Cell Attention inputCellAttention, Temporal Convolutional Network (TCN) TCN2017 and, Transformers vaswani2017attention. Additional details about the dataset and architectures are provided in the supplementary material.

Quantitatively measuring the interpretability of a (neural architecture, saliency method) pair involves applying the saliency method, ranking features according to the saliency values, replacing high salient features with uninformative features from the original distribution at different percentages. Finally, the area under the precision curve (AUP) and the area under the recall curve (AUR) is calculated by the precision/recall values at different levels of degradation. Similar to ismail2020benchmarking, we compare the AUP and AUR with a random baseline; since the baseline might be different for different models, we reported the difference between metrics values generated using the saliency method and the baseline. For example, the difference between gradient and random baseline Diff(AUP) when the model is trained traditionally is calculated as:

(7)

Similarly difference when the model is trained using saliency guided training is:

(8)

The mean metrics over all 10 datasets is shown in Table 2. Higher values indicate better performance; negative values indicate performance similar to random feature assignment. Overall, the best performance was achieved by (TCN, Integrated gradients) when using saliency guided training. Detailed results for each dataset are available in the supplementary material.

Metric Architecture Gradient Integrated Gradient DeepLIFT Gradient SHAP DeepSHAP SmoothGrad
Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal.
Diff(AUP) LSTM -0.113 -0.119 -0.083 -0.024 -0.097 -0.108 -0.088 -0.069 -0.098 -0.109 -0.110 -0.097
LSTM + Input. 0.060 0.118 0.188 0.245 0.202 0.263 0.198 0.250 0.214 0.272 0.040 0.084
TCN 0.106 0.168 0.233 0.291 0.248 0.270 0.235 0.288 0.263 0.280 0.088 0.155
Transformer -0.054 -0.062 0.061 0.044 -0.040 -0.032 0.069 0.023 -0.014 -0.055 -0.018 -0.046
Diff(AUR) LSTM -0.017 0.019 0.062 0.121 0.047 0.089 0.060 0.102 0.031 0.075 0.007 0.004
LSTM + Input. 0.075 0.136 0.185 0.198 0.187 0.204 0.182 0.196 0.183 0.201 0.043 0.111
TCN 0.125 0.171 0.191 0.210 0.202 0.204 0.185 0.209 0.196 0.192 0.046 0.138
Transformer 0.102 0.104 0.182 0.176 0.145 0.146 0.171 0.162 0.101 0.065 0.040 0.018
Table 2: The mean difference in weighted AUP and AUR for different (neural architecture, saliency method) pairs. Overall, the best preference was achieved by TCN when using Integrated gradients as a saliency method and saliency guided training procedure.

Saliency Guided Training reduces vanishing saliency of recurrent neural networks

inputCellAttention showed that saliency maps in RNNs vanish over time, biasing detection of salient features only to later time steps. This section investigates if using saliency guided training reduces the vanishing saliency issue in RNNs. Repeating experiments done by inputCellAttention, three synthetic datasets were generated as shown Figure 6 (A). The specific features and the time intervals (boxes) on which they are considered important are varied between datasets to test the model’s ability to capture importance at different time intervals. We trained an LSTM with traditional and saliency guided training procedures.

The area under precision curve (AUP) and the area under the recall curve (AUR) are calculated by the precision/recall values at different levels of degradation. Higher AUP and AUR suggest better performance. Results are shown in Figure 6 (B).

A traditionally trained LSTM shows clear bias in detecting features in the later time steps; AUP and AUR increase as informative features move to later time steps. When saliency guided training is used, LSTM was able to identify informative features regardless of their locations in time.

Figure 6: (A) Samples from 3 different simulated datasets, informative features are located at the earlier, intermediate, and later time steps. (B) AUP and AUR were produced by LSTM by traditional and saliency guided training procedures. Traditionally trained LSTM shows clear bias in detecting features in the later time steps. When saliency guided training is used, there is no time bias.

6 Summary and Conclusion

We propose saliency guided training as a new training procedure that improves the quality of explanations produced by existing gradient-based saliency methods. saliency guided training is optimized to reduce gradient values for irrelevant features. This is done by masking input features with low gradients and then minimizing the KL divergence between outputs from the original and masked inputs along with the main loss function. We demonstrated the effectiveness of the saliency guided training on images, language, and multivariate time series.

Our proposed training method encourages models to sharpen the gradient-based explanations they provide. It does this however without requiring explanations as input. It instead may be cast as a regularization procedure where regularization is provided by feature sparsity driven by a gradient-based feature attribution. This is an alternative approach to using ground truth explanations to force the model to be right for the right reasons ross2017right. We found that training model explanations in an unsupervised fashion also improves model faithfulness. This opens an interesting avenue for other unsupervised, perhaps regularization-based, methods to improve the interpretability of prediction models.

7 Acknowledgments

This project was supported in part by NSF CAREER AWARD 1942230, a grant from NIST 60NANB20D134, NSF award CDS&E:1854532, ONR grant 13370299 and AWS Machine Learning Research Award.

References

Checklist

Please do not modify the questions and only use the provided macros for your answers. Note that the Checklist section does not count towards the page limit. In your paper, please delete this instructions block and only keep the Checklist section heading above, along with the questions/answers below.

  1. For all authors…

    1. Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?

    2. Did you describe the limitations of your work? section 4

    3. Did you discuss any potential negative social impacts of your work?

    4. Have you read the ethics review guidelines and ensured that your paper conforms to them?

  2. If you are including theoretical results…

    1. Did you state the full set of assumptions of all theoretical results?

    2. Did you include complete proofs of all theoretical results?

  3. If you ran experiments…

    1. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?

    2. Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? In the supplemental material

    3. Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? In the supplemental material

    4. Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? In the supplemental material

  4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets…

    1. If your work uses existing assets, did you cite the creators?

    2. Did you mention the license of the assets?

    3. Did you include any new assets either in the supplemental material or as a URL?

    4. Did you discuss whether and how consent was obtained from people whose data you’re using/curating?

    5. Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content?

  5. If you used crowdsourcing or conducted research with human subjects…

    1. Did you include the full text of instructions given to participants and screenshots, if applicable?

    2. Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable?

    3. Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation?

Supplementary

Experimental Details

Computational resources: All experiments were ran on a single 12GB NVIDIA RTX2080Ti GPU. Saliency Methods: Captum kokhlikyan2020captum implementation was used for different saleincy methods.

Saliency Guided Training for Images

Datasets and Classifiers
  • MNIST lecun2010mnist

    a database of handwritten digits. The classifier consists of two CNN layers with kernel size 3 and stride of 1 followed by two fully connected layers, two dropout layers with

    and , and the 10 output neurons.

  • CIFAR10 krizhevsky2009learning a low-resolution classification dataset with 10 different classes representing airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. ResNet18 he2016deep

    was used as a classifier, ResNet18 is a very deep CNN with “identity shortcut connection,” i.e., skip connections, that skip one or more layers to solve the vanishing gradient problem faced by deep networks.

  • BIRD gerry_2021 A kaggle datasets of 260 bird species. Images were gathered from internet searches by species name. VGG16 simonyan2014very was used as a classifier, the last few dense layers and the output layer were modified to accommodate the number of classes in this dataset.

Dataset # Training # Testing # Classes Features Test Accuracy
Tradtional Sal. Guided (as a % of feature)
MNIST 60000 10000 10 99.4 99.3 1 50%
CIFAR10 50000 10000 10 92.0 91.5 1 50%
BIRD 38518 1350 260 96.6 96.9 1 50%
Table 3: Datasets used for Image experiments. is the percentage of overall features masked during saliency guided training. For example, in MNIST number of features masked .
Masking

For images, low salient features are replaced by a random variable within the color channel input range. For example, in an RGB image, if pixel

is to be masked would be replaced with a random variable within R channel range, similarly and would be replaced with a random variable within G and B channel range respectively.

Saliency Map Quality for Images

The examples shown in Figure 7, Figure 8, and Figure 9 were correctly classified by both models. Gradients are scaled per sample to have values between -1 and 1. Overall, saliency maps produced by saliency guided training are less noisy than those produced by traditional training and tend to highlight the object itself rather than the background. The distributions of gradient values per sample show that most features have small values (near zero) with a higher separation of high saliency features away from zero for saliency guided training.

Figure 7: Saliency maps and saliency distribution for Traditional an Saliency Guided Training on MNIST
Figure 8: Saliency maps and saliency distribution for Traditional an Saliency Guided Training on CIFAR10
Figure 9: Saliency maps and saliency distribution for Traditional an Saliency Guided Training on BIRD

Model Accuracy Drop

We compare interpretable and traditional training for different saliency methods with modification-based evaluation. Each experiment is repeated five times. Figure 10 shows the mean and standard error for model degradation on different gradient-based methods.

Figure 10: The mean and standard error for model accuracy drop when removing features with high saliency using traditional and saliency guided training for different gradient-based methods against a random baseline. A steeper drop indicates better performance. We find that regardless of the saliency method used, the performance improves by saliency guided training.

Fine-tuning with Saliency guided Training

We investigate the effect of training traditionally and fine-tuning with saliency guided training. This would be particularly useful for large datasets like imagenet. Table

4 shows the area under accuracy drop curve (AUC) on MNIST Figure LABEL:fig:AccuracyDropAll for gradient when training traditionally, training using saliency guided procedure and fine-tuning (smaller AUC indicates better performance). We find that fine-tuning improves the performance over traditionally trained networks.

Training Procedures AUC
Traditional 3360.4
Saliency Guided 1817.6
Fine-tuned 2258.8
Table 4: Area under accuracy drop curve on MNIST for different training procedures

Note that, there is not much gain in training performance when training from scratch versus fine-tuning for small datasets like MNIST. However, for larger datasets like CIFAR10, we observed a clear decrease in the number of epochs when fine-tuning the network. The number of epochs for traditional training CIFAR10 is on average 118, saliency training is 124 while fine-tuning takes only 70 epochs.

Saliency Guided Training for Language

We compare the interpretability of different models trained on language tasks using the ERASER deyoung2019eraser benchmark.

Datasets

For all datasets, words embbedding were generated from Glove pennington2014glove and a bidirectional LSTM hochreiter1997long was used for classifications. Details about each dataset is available in Table 5

Dataset # Training # Testing # Classes Tokens Sentences Test Accuracy
Tradtional Sal. Guided (as a % of tokens)
Movie Review 1600 200 2 774 36.8 0.8890 0.8980 1 60%
FEVER 97957 6111 2 327 12.1 0.7234 0.7255 1 80%
e-SNLI 911928 16429 3 16 1.7 0.9026 0.9068 1 70%
Table 5: Overview of datasets in the ERASER benchmark. Number of labels, dataset size, and average numbers of sentences and tokens in each document. is the percentage of overall tokens within a particular document.
Masking

For language tasks, masking is a bit more tricky. We tried multiple masking function, including:

  • Removing the masking function creates new input such that contains only high salient word from the original input .

  • Replace with token “[UNK]” the masking function replaces the low salient word with the token “[UNK]” i.e., unknown.

  • Replace with token “[SEP]” the masking function replaces the low salient word with the token “[SEP]” i.e., white space.

  • Replace with random word the masking function replaces the low salient with a random word from vocabulary.

  • Replace with last high salient word the masking function replaces the low salient word with the previous high salient word.

Over the three datasets, we found that the last masking function (replace with last high salient word) gave the best results. We believe that the masking function can also be dataset-dependent. This particular experiment aims to prove that saliency guided training improves interpretability on language tasks. We will consider finding the optimal masking function for different language tasks in our future work.

Metrics

ERASER provides two metrics to measure interpretability. Comprehensiveness evaluates if all features needed to make a prediction are selected. To calculate an explanation comprehensiveness, a new input is created such that where is predicted rationales. Let be the prediction of model for class . The model comprehensiveness is calculated as:

A high score here implies that the explanation removed was influential in the predictions. The second metric is Sufficiency that evaluates if the extracted explanations contain enough signal to make a prediction. The following equation gives the explanation sufficiency:

A lower score implies that the explanations are adequate for a model prediction.

To evaluate the faithfulness of continuous importance scores assigned to tokens by models, the soft score over features provided by the model is converted into discrete rationales by taking the top- values, where is a threshold for dataset . Denoting the tokens up to and including bin , for instance, by , an aggregate comprehensiveness measure is defined as:

Sufficiency is defined similarly. Here tokens are grouped into k = 5 bins by grouping them into the top and of tokens, with respect to the corresponding importance score. This metrics is referred to as Area Over the Perturbation Curve (AOPC). For reference, we report these when random scores are assigned to tokens. Results are shown in the main paper Table 1.

Saliency Guided Training for Time Series

We evaluated saliency guided training on a multivariate time series, both quality on multivariate time series MNIST and quantitatively through synthetic data.

Saliency Maps Quality for Multivariate Time Series

We compare the saliency maps produced on MNIST treated as a multivariate time series with 28 time steps each having 28 features. Figure 11, Figure 12, and Figure 13 shows the saliency maps produced by different saliency methods for Temporal Convolutional Network (TCN), LSTM with Input-Cell Attention and, Transformers respectively. There is a visible improvement in saliency quality across different networks when saliency guided training is used. The most significant improvement was found in TCNs.

Figure 11: Saliency maps produced for (TCN, saliency method) pairs.
Figure 12: Saliency maps produced for (LSTM with Input-Cell Attention, saliency method) pairs.
Figure 13: Saliency maps produced for (Transformers, saliency method) pairs.

Quantitative Analysis on Synthetic Data

We evaluated saliency guided training on a multivariate time series benchmark proposed by ismail2020benchmarking. The benchmark consists of 10 synthetic datasets, each examining different design aspects in typical time series datasets. Properties of each dataset is shown in Figure 14. Informative features are highlighted by the addition of a constant to the positive class and subtraction of from the negative class. For the following experiments . Details of each dataset is available in table 6.

Figure 14: Figure from ismail2020benchmarking: Different evaluation datasets used for benchmarking saliency methods. Some datasets have multiple variations shown as sub-levels. N/S: normal and small shapes, T/F: temporal and feature positions, M: moving shape. All datasets are trained for binary classification. Examples are shown above each dataset, where dark red/blue shapes represent informative features.
Dataset # Training # Testing # Time Steps # Feature # Informative # Informative
Time steps Features
Middle 1000 100 50 50 30 30
Small Middle 1000 100 50 50 15 15
Moving Middle 1000 100 50 50 30 30
Moving Small Middle 1000 100 50 50 15 15
Rare Time 1000 100 50 50 6 40
Moving Rare Time 1000 100 50 50 6 40
Rare Features 1000 100 50 50 40 6
Moving Rare Features 1000 100 50 50 40 6
Postional Time 1000 100 50 50 20 20
Postional Feature 1000 100 50 50 20 20
Table 6: Synthetic dataset details: Number of training samples, number of testing samples, number of time steps per sample, number of features per time step, number of time steps with informative features, and number of informative features in an informative time step.

Following ismail2020benchmarking, we compare 4 neural architectures: LSTM hochreiter1997long, LSTM with Input-Cell Attention inputCellAttention, Temporal Convolutional Network (TCN) TCN2017 and, Transformers vaswani2017attention. Each (neural architecture, dataset) pair was trained both traditionally and using saliency guided training. Test accuracy is reported in Table 7

Datasets LSTM LSTM+ Input-Cell TCN Transformer
Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal.
Middle 99 100 100 100 100 100 100 100
Small Middle 100 100 99 100 100 100 100 100
Moving Middle 100 100 100 100 100 100 99 100
Moving Small Middle 100 100 99 100 100 100 99 100
Rare Time 100 100 99 100 100 100 100 100
Moving Rare Time 100 100 100 100 100 100 99 100
Rare Features 100 100 100 100 100 100 100 100
Moving Rare Features 99 100 99 99 100 100 99 100
Positional Time 100 100 100 100 100 100 100 100
Positional Feature 100 100 99 100 99 100 100 100
Table 7: Test accuracy of different (neural architecture, dataset) pairs.

Quantitatively measuring the interpretability of a (neural architecture, saliency method) pair involves applying the saliency method, ranking features according to the saliency values, replacing high salient features with uninformative features from the original distribution at different percentages. Finally, we measure the model accuracy drop, weighted precision, and recall.

The area under precision curve (AUP) and the area under the recall curve (AUR) are calculated by the precision/recall values at different levels of degradation. Similar to ismail2020benchmarking, we compare the AUP and AUR with a random baseline; since the baseline might be different for different models, we reported the difference between metrics values generated using the saliency method and the baseline. All experiments were ran 5 times the mean Diff(AUP), and Diff(AUR) is shown in Tables [8-11].

The results in Tables [8-11] show the follows: LSTM: Saliency guided training along with Integrated Gradient has the best precision and recall. LSTM with Input Cell Attention: Saliency guided training improves the performance of different saliency methods and datasets. DeepSHAP gives the best precision, while DeepSHAP gives the best recall. TCN: overall, saliency guided training improves the performance of different saliency methods and datasets. Integrated Gradient, Gradient SHAP, and DeepSHAP are best performing saliency methods. Transformers: have the worst interpretability. Using saliency guided training improved recall but not precision.

Metric Datasets Gradient Integrated Gradient DeepLIFT Gradient SHAP DeepSHAP SmoothGrad
Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal.
Diff(AUP) Middle 1 30% -0.280 -0.280 -0.261 -0.036 -0.267 -0.270 -0.263 -0.124 -0.267 -0.271 -0.283 -0.269
Small Middle 1 60% -0.071 -0.066 -0.053 0.052 -0.070 -0.055 -0.056 -0.033 -0.070 -0.055 -0.072 -0.044
Moving Middle 1 60% -0.265 -0.277 -0.218 -0.169 -0.237 -0.264 -0.222 -0.237 -0.239 -0.264 -0.259 -0.263
Moving Small Middle 1 5% -0.059 -0.060 -0.035 0.051 -0.043 -0.045 -0.042 -0.013 -0.044 -0.046 -0.056 -0.037
Rare Time 1 30% -0.076 -0.076 -0.075 -0.065 -0.076 -0.076 -0.075 -0.071 -0.076 -0.076 -0.076 -0.068
Moving Rare Time 1 50% -0.067 -0.058 -0.042 0.016 -0.053 -0.042 -0.045 -0.010 -0.054 -0.043 -0.061 -0.032
Rare Feature 1 30% -0.063 -0.075 -0.039 0.006 -0.047 -0.073 -0.038 -0.027 -0.048 -0.073 -0.069 -0.059
Moving Rare Feature 1 10% -0.062 -0.069 -0.021 0.012 -0.040 -0.056 -0.032 -0.029 -0.041 -0.056 -0.059 -0.044
Postional Time 1 30% -0.116 -0.119 -0.040 -0.006 -0.107 -0.112 -0.058 -0.046 -0.108 -0.113 -0.111 -0.102
Postional Feature 1 2% -0.064 -0.104 -0.042 -0.104 -0.028 -0.089 -0.043 -0.105 -0.031 -0.091 -0.055 -0.053
Diff(AUR) Middle 1 30% 0.072 0.076 0.128 0.153 0.125 0.135 0.122 0.132 0.114 0.126 0.070 0.031
Small Middle 1 60% -0.043 0.037 0.048 0.157 0.029 0.116 0.038 0.129 0.007 0.102 -0.032 0.009
Moving Middle 1 60% 0.060 0.073 0.119 0.124 0.110 0.124 0.119 0.117 0.099 0.115 0.061 0.042
Moving Small Middle 1 5% -0.032 -0.004 0.046 0.135 0.043 0.073 0.042 0.093 0.025 0.060 -0.023 -0.025
Rare Time 1 30% -0.244 -0.137 -0.132 0.043 -0.169 -0.021 -0.116 0.005 -0.189 -0.043 -0.145 -0.108
Moving Rare Time 1 50% -0.222 -0.070 -0.092 0.075 -0.103 0.018 -0.065 0.060 -0.131 0.002 -0.144 -0.035
RareFeature 1 30% 0.182 0.197 0.219 0.218 0.217 0.223 0.216 0.216 0.211 0.219 0.191 0.166
Moving Rare Feature 1 10% 0.143 0.162 0.191 0.196 0.191 0.202 0.194 0.196 0.183 0.197 0.162 0.107
Postional Time 1 30% -0.032 -0.073 0.072 0.119 0.029 0.021 0.046 0.082 0.012 0.001 -0.019 -0.064
Postional Feature 1 2% -0.053 -0.070 0.016 -0.005 -0.002 -0.005 0.004 -0.009 -0.018 -0.025 -0.056 -0.083
Table 8: Difference in weighted AUP and AUR for (LSTM, saliency method) pairs. Overall, the best preference was achieved when using Integrated Gradients as a saliency method and saliency guided training as a training procedure.
Metric Datasets Gradient Integrated Gradient DeepLIFT Gradient SHAP DeepSHAP SmoothGrad
Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal.
Diff(AUP) Middle 1 40% 0.014 0.046 0.233 0.252 0.218 0.237 0.244 0.261 0.232 0.247 -0.006 0.026
Small Middle 1 60% 0.049 0.150 0.161 0.273 0.169 0.305 0.170 0.273 0.180 0.312 0.038 0.091
Moving Middle 1 80% 0.044 0.082 0.262 0.251 0.260 0.256 0.276 0.256 0.277 0.261 0.010 0.044
Moving Small Middle 1 5% 0.044 0.055 0.181 0.201 0.179 0.196 0.187 0.200 0.190 0.204 0.022 0.029
Rare Time 1 40% 0.186 0.278 0.271 0.378 0.323 0.412 0.279 0.373 0.338 0.424 0.133 0.209
Moving Rare Time 1 80% 0.144 0.276 0.233 0.388 0.269 0.417 0.238 0.381 0.282 0.429 0.103 0.167
Rare Feature 1 30% 0.032 0.101 0.163 0.270 0.166 0.266 0.174 0.278 0.180 0.274 0.039 0.105
Moving Rare Feature 1 5% -0.002 -0.004 0.120 0.124 0.116 0.116 0.124 0.127 0.126 0.126 -0.003 -0.004
Postional Time 1 40% 0.117 0.186 0.184 0.225 0.236 0.314 0.197 0.252 0.248 0.316 0.093 0.187
Postional Feature 1 5% -0.021 0.007 0.072 0.083 0.080 0.113 0.089 0.101 0.088 0.122 -0.031 -0.012
Diff(AUR) Middle 1 40% 0.028 0.084 0.163 0.176 0.160 0.180 0.162 0.173 0.157 0.177 -0.001 0.044
Small Middle 1 60% 0.064 0.176 0.186 0.217 0.189 0.217 0.182 0.212 0.183 0.213 0.031 0.159
Moving Middle 1 80% 0.060 0.117 0.174 0.180 0.175 0.187 0.173 0.177 0.175 0.183 0.021 0.072
Moving Small Middle 1 5% 0.079 0.101 0.202 0.201 0.199 0.194 0.198 0.196 0.194 0.186 0.029 0.052
Rare Time 1 40% 0.139 0.203 0.214 0.225 0.214 0.233 0.211 0.223 0.211 0.233 0.103 0.191
Moving Rare Time 1 80% 0.118 0.213 0.198 0.226 0.200 0.233 0.193 0.224 0.194 0.232 0.070 0.197
RareFeature 1 30% 0.077 0.181 0.196 0.223 0.197 0.224 0.193 0.222 0.193 0.223 0.074 0.172
Moving Rare Feature 1 5% 0.059 0.039 0.188 0.189 0.191 0.186 0.182 0.183 0.186 0.180 0.038 0.028
Postional Time 1 40% 0.140 0.201 0.188 0.200 0.203 0.225 0.185 0.202 0.201 0.224 0.109 0.188
Postional Feature 1 5% -0.017 0.043 0.141 0.146 0.145 0.166 0.141 0.150 0.132 0.157 -0.041 0.005
Table 9: The difference in weighted AUP and AUR for different (LSTM with Input-Cell Attention, saliency method) pairs. The use of saliency guided training improved the performance of most saliency methods. Overall, DeepSHAP and DeepLIFT produced the best precision and recall, respectively, when combined with saliency guided training.
Metric Datasets Gradient Integrated Gradient DeepLIFT Gradient SHAP DeepSHAP SmoothGrad
Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal.
Diff(AUP) Middle 1 50% 0.127 0.217 0.283 0.393 0.350 0.398 0.290 0.384 0.365 0.416 0.090 0.194
Small Middle 1 40% 0.164 0.260 0.299 0.433 0.312 0.419 0.302 0.418 0.328 0.442 0.156 0.253
Moving Middle 1 70% 0.122 0.197 0.287 0.342 0.332 0.367 0.286 0.329 0.345 0.387 0.047 0.182
Moving Small Middle 1 80% 0.065 0.043 0.194 0.151 0.169 0.191 0.190 0.152 0.183 0.200 0.037 0.023
Rare Time 1 50% 0.184 0.290 0.314 0.363 0.324 0.309 0.314 0.360 0.352 0.319 0.177 0.226
Moving Rare Time 1 50% 0.142 0.182 0.260 0.333 0.257 0.243 0.258 0.330 0.275 0.251 0.122 0.179
Rare Feature 1 30% 0.058 0.244 0.246 0.451 0.252 0.422 0.249 0.453 0.286 0.450 0.085 0.259
Moving Rare Feature 1 5% -0.003 0.004 0.116 0.134 0.112 0.114 0.122 0.129 0.122 0.123 0.007 0.005
Postional Time 1 70% 0.115 0.072 0.180 0.114 0.233 0.069 0.187 0.117 0.237 0.035 0.106 0.066
Postional Feature 1 10% 0.082 0.176 0.151 0.199 0.136 0.162 0.155 0.203 0.137 0.175 0.058 0.159
Diff(AUR) Middle 1 50% 0.133 0.161 0.190 0.207 0.202 0.209 0.188 0.205 0.201 0.205 0.054 0.128
Small Middle 1 40% 0.086 0.230 0.194 0.240 0.202 0.240 0.189 0.239 0.196 0.241 0.039 0.230
Moving Middle 1 70% 0.134 0.144 0.191 0.195 0.201 0.208 0.186 0.194 0.201 0.203 0.036 0.121
Moving Small Middle 1 80% 0.118 0.117 0.204 0.199 0.193 0.195 0.196 0.196 0.186 0.190 -0.001 0.065
Rare Time 1 50% 0.173 0.215 0.199 0.233 0.225 0.221 0.193 0.230 0.226 0.204 0.125 0.151
Moving Rare Time 1 50% 0.106 0.198 0.177 0.220 0.195 0.189 0.167 0.224 0.191 0.179 -0.057 0.149
RareFeature 1 30% 0.152 0.222 0.222 0.239 0.223 0.239 0.219 0.239 0.224 0.239 0.130 0.217
Moving Rare Feature 1 5% 0.101 0.122 0.198 0.204 0.206 0.205 0.195 0.201 0.196 0.198 0.048 0.055
Postional Time 1 70% 0.126 0.128 0.156 0.172 0.194 0.160 0.147 0.165 0.181 0.110 0.039 0.102
Postional Feature 1 10% 0.126 0.174 0.177 0.196 0.180 0.175 0.172 0.194 0.164 0.154 0.049 0.160
Table 10: The difference in weighted AUP and AUR for different (TCN, saliency method) pairs. The use of saliency guided training improved the performance of most saliency methods. Overall, when combined with saliency guided training, Integrated Gradients and DeepSHAP produced the best precision. For recall, Integrated Gradients, DeepLift, Gradient SHAP, and DeepSHAP seem to perform similarly, again, the best performance was achieved when saliency guided training is used.
Metric Datasets Gradient Integrated Gradient DeepLIFT Gradient SHAP DeepSHAP SmoothGrad
Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal. Trad. Sal.
Diff(AUP) Middle 1 30% -0.179 -0.213 0.051 -0.004 -0.116 -0.176 0.067 -0.064 -0.062 -0.222 -0.069 -0.150
Small Middle 1 60% -0.034 -0.057 0.054 0.042 -0.018 -0.034 0.066 0.024 0.009 -0.060 0.006 -0.022
Moving Middle 1 90% -0.188 -0.146 0.062 0.018 -0.142 -0.067 0.065 -0.011 -0.130 -0.143 -0.091 -0.157
Moving Small Middle 1 70% -0.002 -0.008 0.031 0.039 0.017 0.029 0.026 0.037 0.036 0.016 -0.021 -0.035
Rare Time 1 50% 0.038 -0.006 0.118 0.057 -0.019 0.017 0.132 0.049 0.014 -0.010 -0.029 -0.031
Moving Rare Time 1 50% 0.066 0.062 0.110 0.049 -0.021 0.046 0.117 0.055 -0.009 0.033 -0.026 -0.027
Rare Feature 1 30% -0.049 -0.045 0.029 0.139 -0.002 -0.004 0.033 0.088 0.008 -0.015 0.005 0.028
Moving Rare Feature 1 10% -0.034 -0.031 0.041 0.055 0.008 0.014 0.038 0.049 0.008 0.022 -0.003 -0.013
Postional Time 1 60% -0.060 -0.078 0.084 0.029 -0.048 -0.047 0.102 -0.001 0.013 -0.072 0.026 -0.057
Postional Feature 1 10% -0.094 -0.099 0.032 0.012 -0.061 -0.097 0.046 0.008 -0.029 -0.098 0.019 0.006
Diff(AUR) Middle 1 30% 0.087 0.053 0.167 0.146 0.155 0.112 0.157 0.111 0.119 -0.051 0.040 -0.025
Small Middle 1 60% 0.085 0.030 0.186 0.189 0.128 0.113 0.173 0.171 0.077 -0.025 0.060 0.036
Moving Middle 1 90% 0.071 0.134 0.164 0.185 0.136 0.181 0.150 0.183 0.057 0.130 0.019 0.040
Moving Small Middle 1 70% 0.118 0.137 0.171 0.177 0.157 0.171 0.160 0.171 0.098 0.117 -0.004 -0.019
Rare Time 1 50% 0.152 0.139 0.206 0.172 0.116 0.135 0.199 0.157 0.077 0.088 -0.027 -0.073
Moving Rare Time 1 50% 0.184 0.185 0.198 0.170 0.124 0.175 0.186 0.168 0.059 0.127 -0.033 -0.013
RareFeature 1 30% 0.115 0.152 0.184 0.217 0.187 0.188 0.173 0.203 0.144 0.129 0.087 0.135
Moving Rare Feature 1 10% 0.101 0.122 0.179 0.183 0.174 0.180 0.165 0.176 0.125 0.149 0.060 0.034
Postional Time 1 60% 0.091 0.071 0.193 0.172 0.154 0.150 0.184 0.151 0.145 0.059 0.103 -0.004
Postional Feature 1 10% 0.017 0.013 0.170 0.149 0.123 0.057 0.160 0.131 0.105 -0.072 0.094 0.073
Table 11: The difference in weighted AUP and AUR for different (Transformers, saliency method) pairs. In this benchmark, Transformers seem to have the worst interpretability. Using saliency guided training improved recall but not precision. Overall best precision was achieved when combining traditional training with Gradient SHAP. While best recall was achieved when using saliency guided training and Integrated Gradients.