Combining Similarity and Adversarial Learning to Generate Visual Explanation: Application to Medical Image Classification

12/14/2020 ∙ by Martin Charachon, et al. ∙ 0

Explaining decisions of black-box classifiers is paramount in sensitive domains such as medical imaging since clinicians confidence is necessary for adoption. Various explanation approaches have been proposed, among which perturbation based approaches are very promising. Within this class of methods, we leverage a learning framework to produce our visual explanations method. From a given classifier, we train two generators to produce from an input image the so called similar and adversarial images. The similar image shall be classified as the input image whereas the adversarial shall not. Visual explanation is built as the difference between these two generated images. Using metrics from the literature, our method outperforms state-of-the-art approaches. The proposed approach is model-agnostic and has a low computation burden at prediction time. Thus, it is adapted for real-time systems. Finally, we show that random geometric augmentations applied to the original image play a regularization role that improves several previously proposed explanation methods. We validate our approach on a large chest X-ray database.



There are no comments yet.


page 6

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Deep neural models have enabled to reach high performances on various applications and in particular in medical image analysis[8]. In this field, additionally to high accuracy, it is critical to provide interpretative explanations of the system decision since the clinicians’ confidence in the system is at stake[15]. Thus, several methods were proposed to address the visual explanation problem, ranging from saliency maps[27, 28, 14, 30] and class activation mapping methods[38, 26] to perturbation maps[9, 4]. However the problem is still open since there is no general consensus on their performances. Independently, under the motivation of model safety, several methods were proposed to generate ”fake” images that ”fool” classification algorithms. Such ”fooling” images are generated either by the addition of a small perturbation on top of the input image[12, 7, 35], or as a complete new image, very close to the input[37]. Most of these works point out that adversarial generation allows to study the model robustness and fragility but very few make links with the explainability problem.

In this work, we propose to leverage adversarial generation methods to produce interpretative explanation of classifier’s decision. Inspired by [4] and [37] who both train model to generate respectively explanation masks and adversaries, we propose to train a model to generate images that capture discriminative structures with the following key contributions:

  • A new framework of explanation is proposed. We define the explanation of a classifier’s decision as the difference between a regularized adversarial example and the ”projection” of the original image into the space of adversarial generated samples.

  • We introduce a new optimization workflow that combines the training of an adversarial generator and of a similar generator that ”projects” the original image into the adversarial space.

  • We propose a new method that greatly improves the use of several explanation methods as means to localize decisive objects in the image: Namely, averaging of registered explanation results built upon random geometric augmentations of the input image.

We validate our algorithm on a binary classification problem (pathological/healthy) of a large database of X-ray chest images. We compare our technique to state of the art approaches such as Gradient[27], GradCAM[26], BBMP[9], Mask Generator[4].

Ii Related Work

Explaining classifiers decisions via some visual map has been the subject of several contributions. We here classify in three categories a selected few.

Ii-a Back-propagation-based methods

These methods leverage, for neural networks and for a given image, back-propagation of small variations of the model’s prediction

[27]. While providing interesting results, these methods tend to produce noisy explanation maps since any variations of the model’s output is considered. Many contributions are thus focused on building sharper and smoother explanation maps[36, 29, 28, 30]. In[38], explanation maps are produced by upsampling activations from the last convolutional layer to the input image size, GradCAM[26] (or its application in medical domain[22]), builds on this work by computing the gradient of the output with respect to the last convolutional layer (and not only with respect to the model’s prediction) generating compelling results. For an exhaustive review, the reader can refer to [1] and [23]. As a limitation, these approaches are not model-agnostic (limited to neural networks) and need access to intermediate layers.

Ii-B Iterative perturbation-based methods

The principle of this approaches is to exploit the effects of perturbations to the input image on the model’s output[36]. For instance, LIME[24] proposes a local explanation by perturbing random segments of the input image and training a linear classifier to predict the importance of each segment for the classifier’s prediction.
The authors of[9] take this idea further by defining their explanation maps as the result of an optimization procedure over the input image and the model to explain. Considering a fixed perturbation, their approach consists in learning the maps that maximally impact the model or on the contrary the maps that enable to preserve its performance. Similarly, building on [3] for the medical imaging domain,[31, 21] adopt a similar optimization formulation but focus on the perturbations. They use generative methods to, respectively, perturb pathological images by local in-painting healthy tissues within pathological images or completely reconstructing a healthy image.
As noisy outputs is a major concern within these methods, some authors focus on regularization terms [10] and others filter gradients during back-propagation [32]. Different optimization formulations were also introduced[6, 16]. In[18], an explanation is generated through features perturbation at different levels of the neural network. In[34, 7] the optimization problem is revisited as an adversarial example generation, where the adversarial perturbation is sought in a regularized and restricted space.
Note that all these methods have in common the necessity to solve, for each image, an optimization problem in order to produce an explanation map. This translates into a high computational cost, as several iterations are needed for convergence (often inappropriate when a real time response is expected). Moreover, since the optimization problem is solved for every image, over-fitting issues arise. Explanation maps often contain features not linked to the models’ behaviour but only to the image being processed. Strong and elaborated regularization is thus a necessity.

Ii-C Trained perturbation-based methods

In order to alleviate computational needs of iterative perturbation-based methods, [4] proposes to evolve to an optimization problem on the whole database, thus learning a masking model. In medical imaging,[20]

uses the same approach on a single class problem. Despite the benefits of this optimization strategy, two main drawbacks remain. First, perturbations are provided as parameters to be set and adapted manually. Their choice is impacted not only by the database and the considered classifier but also by the training of this classifier (e.g. a random noise perturbation has no effects on models trained to be robust to this noise). Since perturbations are manually selected, residual adversarial artefacts (without any link to the explanation) are still generated. Second, a costly hyperparameters tuning is needed to control the size of the generated explanation masks.

Iii Methodology

We present our methodology to generate explanations for image classification outcomes. As for the methods exposed in section II, our explanation is given as a visual explanation map where higher values code for more important areas on the image w.r.t to the classifier decision. For the sake of simplicity, we present the rationale behind our approach in the case of a binary classification problem, the extension to the multi-class case being presented in section III-C. Let be the studied classifier outputting a classification score in the range . Without loss of generality, we assign label 1 (resp. label 0) to an input image if its classification score (

value) is over 0.5 (resp. under 0.5). In the case of a different threshold one can apply for instance a piece-wise linear transformation to

to satisfy this condition.

Iii-a Explanation through similar and adversarial generations

Iii-A1 Adversarial naive formulation

A naive, yet novel, approach to reach our objective is to combine a trainable masking model [4] with adversarial perturbations for visual explanations[7]. The visual explanation map is then given by the difference between the input image and its adversary. This method is no longer dependent on the choice of a perturbation function since the adversarial sample ”learns” this perturbation. For an input image we define the naive explanation as


where is a model obtained via a training process with the goal of ”fooling” the classifier while producing an image ”very close” to , written as follows:


The mean value is taken over a training data set. Generating a visual explanation using (1) and (2) effectively counterbalances drawbacks of [4] & [7] but despite the regularization expected from the learning process, visual explanations are often corrupted by noise, highlighting regions which clearly should have no impact on the classifiers decision (see section V-B).

Iii-A2 Similar-Adversarial formulation

Why does the naive formulation generate incoherent visual explanations? We argue that the flaw resides in the definition of explanation as expressed in equation  (1). Comparing the original image with its generated adversarial sample exposes the method to a risk of reconstruction error. Some details of the original image can be absent from the generated adversarial sample. However these details are not discriminative for the classifier in the sense that their sole presence would not change the classification score. More formally, the adversarial sample belongs to the target space of () which is different from the space of original images (). The comparison between and inherits from the differences between and and these differences are not explicitly linked to the explanation problem by Equation (2). Since we do not have control on the original image space , we propose to mitigate this reconstruction risk by defining the explanation mask as the difference between the adversary and the closest element to in on which returns the same value as . We call this element the similar image and it is denoted by . is the function mapping images to their similar counterparts. The rationale is to reduce the reconstruction error so that only contains values related to the classifiers’ decision and reads


Denoting the target space of , both and are built through a joint optimization process aiming to make and as ”close” as possible while satisfying and :


where , and are distance functions between elements of the different image spaces while is a measure between the two functions. Henceforth, we refer to and as the similar and adversarial generators respectively.

Iii-B Weaker formulation: Objective function

We propose to solve a weak formulation of the previous constrained optimization problem  (4). We search for both similar and adversarial generators as minimizers of the following unconstrained problem


is a similarity loss that accounts for the term in equation (4) and enforces the proximity between , and . , the classification loss, is a weak formulation of the classification constraints in (4) enforcing the similarity between and and their dissimilarity with . enforces the similarity between and (). In addition to the terms of (4), acts on the difference () to enforce regularity. An embodiment of optimization problem (5) when using neural networks is given in Figure 1 (see section IV-C). We next specify the choices made in our method for each of the terms in Equation (5).

Fig. 1: Overview of Duo (Top) and Single (Bottom)

Iii-B1 Similarity Loss

Is defined as


where parameters adjust the importance attached to the different terms. Combining and norms to enforce similarity between and produces better results experimentally (as in [37]).

Iii-B2 Classification Loss

Is defined as


where are weighting parameters and is the binary cross entropy loss. This term accounts for the weak formulations of constraints in (4), favoring classifier to act on as it acts on and in the opposite manner for .

Iii-B3 Generator Loss

Is a measure of the distance between the two generators. In the particular case (see section IV-C) where they both are neural networks (parameterized by and respectively) we used the following metric


where we assume generators and to have the same architecture. is a weighting parameter. Note that metrics used in GANs to measure discrepancies between distributions[2] may also be considered.

Iii-B4 Regularization Loss

Is defined as


where parameter controls the relative importance of with respect to the other terms of  (5) and is the dimension of the output space of the generators. This term acts as favoring the proximity of and and regularizes the explanation map (3).

Iii-C Multi-class situation

Weak optimization problem (5) can be adapted to the multi-class problem by modifying

to account for a vector valued

function. This boils down to modifying term (7) adapting CW loss of[7, 37] into


where index is defined by corresponding to the class selected by the classifier on input . is a strictly positive margin.

Iii-D Explanation and augmentations

As our visual explanation is defined as the difference between two generated images, we suggest to regularize the output of our explanation method by averaging all outputs on random geometrical transformations of the input image. Thus, discriminative regions against reconstruction errors are further enforced. This average reads:


where are random geometric transformations such as rotations, translations, zoom, axis flip, etc. This particular regularization can be applied to all other visual explanation techniques (see section V-B).

In the following sections, we denote by , the output of similar and adversarial generators respectively.

Iv Experiments

Iv-a Datasets

We tested our approach on a publicly available Chest X-rays dataset for a binary classification problem. The Chest X-rays dataset comes from the RSNA Pneumonia Detection Challenge dataset which is a subset of 26,684 exams in dicom taken from the NIH CXR14 dataset[33]. We only extracted healthy and pneumonia cases from the original dataset. The resulting database is composed of 14,863 exams: 6,012 pneumonia - 8,851 healthy. We split the dataset into 3 random groups (80%, 10%, 10%) : train (11,917) - validation (1,495) - test (1,451). X-rays exams with opacities contain bounding box ground truth annotations.

Iv-B Classifier Set Up

The classification model whose decisions need to be explained consists of a ResNet50[13]

. We adapt the last layers of the ResNet50 network in order to tackle a binary classification task (healthy/pathology). We transfer the pre-trained backbone layers from Imagenet


to our binary classifier. Then, the network is trained on the whole training set for 50 epochs with a batch size of 32. We use the Adam optimizer

[19] with an initial learning rate of 1e-4. Original X-rays are resized from 1024x1024 to 224x224 and normalized to . We also used zoom, translations, rotations and vertical flips as random data augmentations. The binary classifier achieves an AUC of 0.974 on the test set.

Iv-C Generative Explanation Model

For the similar and adversarial generators, as in[17, 37], generators roughly follows the UNet architecture[25]. We propose two different types of generators: (i) Duo (Figure 1 - Top): and are two separated UNet auto-encoders. (ii) Single (Figure 1 - Bottom): and

share a same auto-encoder part that captures image structure for both generators. They differ by two identical convolutional neural networks connected at the end of the common autoencoder. Index

indicates the number of convolutional layers in the separated CNN.

Generators take as input the same image as the classifier with 3 channels and dimensions 224x224. Both generators are trained simultaneously for 70 epochs with a batch size of 8 for Single and 4 for Duo

, with the same augmentations used for the classifier. Adam optimizer is used with an initial learning rate of 1e-4, and we reduce the learning rate by 3 each time the loss does not decrease after 3 epochs. Through trial an error we selected the objective loss function (

5) parameters providing the best results and summarized them in Table I.

Model name
Duo (TV) 1 1 1 0 0.001 0 0.2
Duo (W,TV) 1 1 1 0 0.001 0.1 0.2
Single (TV) 1 1 1 0 0.001 0 0.2
Single (W) 3 1 1 0.2 0.001 0.1 0
Single (W,TV) 1 1 1 0.2 0.001 0.1 0.2
Single (W) 3 1 1 0.2 0.001 0.2 0
Single (W, TV) 3 1 1 0.2 0.001 0.2 0.2
TABLE I: Selected parameters for couples of generators

Iv-D Augmentation during generator’s prediction

During generator’s prediction, for each image , we generate 10 augmented images with random geometric transformations of parameters described in Table II

Transformation value(s)
Rotations range (
Height shift range (pixels)
Width shift range (pixels)
Zoom range
Random horizontal flip (True, False)
Random vertical flip (True, False)
TABLE II: Augmentation Parameters

Iv-E Method Evaluation

Generators Evaluation
The evaluation is achieved on the classifier’s test set. For similar and adversarial generators and , we respectively evaluate the similarity between , and

. Structural Similarity Index (SSIM), as well as the Peak Signal to Noise Ratio (PSNR) are used to evaluate the similarity between pair of images. For the classification purpose, we compute the area under the ROC curve between the rounded value of the classifier predictions

(resp. ) and (resp. ).

Interpretability Evaluation
In state of the art methods and more specifically in medical imaging, a visual explanation is considered as interpretable if: (i) The highlighted regions coincide with discriminative regions for humans. In our classification problem, salient regions should overlap opacity regions where the pathologies are found. (ii) The highlighted regions coincide with context regions that are also discriminative for humans. We can quantitatively assess the overlap between explanation map and ground truth annotations by conducting a weak localization experiment. We use two metrics to evaluate the localization performance: the intersection over union (

) and an estimated area under the curve (

). We compute the intersection over union between the ground truth mask and the thresholded explanation mask , as defined in (12).


where is the binary mask included inside the ground truth bounding box annotation, and is the binary mask obtained when we threshold the explanation mask at the -th percentile :


We also measure the precision and the sensitivity of the localization for different thresholds

in order to compute the area under the precision and recall curve as introduced in



where , and . Our estimation of differs from[11] as we only compute the metrics over the hundred values of percentile instead of all sorted values of the explanation map.
We also compute a partial for percentiles between and as it is more representative of the volume occupied by the ground truth mask . We show some statistics of the bounding box annotations in Table III.
We compare our proposed method to the naive one (see section III-A) and to the following state of the art approaches: Gradient[27], Smooth-Gradient[28], Input Gradient[14], Integrated Gradient[30], GradCAM[26], BBMP[9], Mask Generator[4] and Perceptual Perturbation [7]. The best BBMP results are reached when looking for a mask at 56x56 and with Gaussian blur perturbation. The mask Generator follows the UNet architecture described in[4]

, but we remove the class selector and adapt the objective function to a single class problem. The best results are obtained when we generate a mask at size 112x112 and then upsample it to image dimensions. For Perceptual perturbation which is not model-agnostic, we regularize the first ReLU layer of each convolution block of the ResNet50 classifier. We also adapt the optimization to a single class problem.

Metrics Height (pixels) Width (pixels) Area Ratio (%)
Min 13 13 0.5
Max 171 91 25.3
Mean 71.8 47.5 7.3
Median 67.8 46.8 6.3
TABLE III: Opacities bounding box statistics

V Results & Discussion

V-a Generator evaluation

For the different architectures and optimization tested, both generators and reach high performance in term of classification. As shown in Table IV, similar images are almost all classified as the original ones, as the almost reaches . Adversarial images achieve better adversarial attacks either when the network (Single) or the weights regularization (W) causes the generators and to be close to each other (Table IV). They even outperform the naive approach (Adv. (TV)) where is trained without .

Explanation method
Naive - 0.939
Duo (TV) 1.0 0.905
Duo (W,TV) 1.0 0.958
Single (TV) 1.0 0.961
Single (W) 0.998 0.952
Single (W,TV) 0.997 0.944
Single (W) 0.998 0.949
Single (W, TV) 0.998 0.952
TABLE IV: Classification AUC on similar and adversarial images

For the similarity, both generators produce samples visually highly similar to original images (see Figure 2 and Table V). Similar images best perform for both SSIM and PSNR when generators are not constrained by weight regularization. At the opposite, adversarial images increase their similarity to both and when generators are constrained, and it even outperforms the naive adversarial generator trained on its own. In our case, the objective is to produce and as close as possible in order to reduce non discriminative differences, while having very close to . As shown in Table V, Single regularized with weights proximity (W) produce highly similar samples and , while maintaining a strong similarity between and .

Fig. 2: Examples of original images with respective similar and adversarial generated images. Case (1): - - - PSNR between orignal and similar image - PSNR between orignal and adversarial image - PSNR between similar and adversarial image . Case (2): - - - - -
Explanation method
Metrics ssim psnr ssim psnr ssim psnr
Adv. (TV) - - 0.994 41.92 - -
Duo (TV) 0.996 44.07 0.987 39.47 0.994 43.89
Duo (W,TV) 0.995 41.99 0.987 39.08 0.995 44.26
Single (TV) 0.997 44.57 0.989 40.67 0.996 45.25
Single (W) 0.994 42.73 0.993 41.85 0.999 52.59
Single (W,TV) 0.992 41.79 0.991 41.35 0.999 54.55
Single (W) 0.995 43.61 0.994 42.42 0.999 52.26
Single (W, TV) 0.995 43.88 0.994 42.63 0.999 51.93
TABLE V: Similarity metrics between generated and original images

V-B Weak localization evaluation

As shown in Table III, bounding box annotations of the test set occupy from 0.5 to 25.3 % of the image with an average occupation of 7.3%. For different generators and regularizations, we accordingly list the results of the averaged IOU for between the 80th and 100th percentile value in Table VI, and total and partial in Table VII. Firstly, Single clearly outperforms the Duo version for all IOU and AUC scores. The Single approach compelled and to capture the same information on the original image by sharing a common autoencoder. As shown in Table V, the proximity between and as well as between and is better for Single approaches.

Explanation method IOU
Percentile 80 85 90 95 98
Duo (TV) 0.190 0.182 0.164 0.122 0.070
Duo (W,TV) 0.188 0.184 0.170 0.132 0.079
Single (TV) 0.187 0.182 0.166 0.127 0.075
Single (W) 0.227 0.222 0.204 0.157 0.090
Single (W,TV) 0.234 0.235 0.220 0.171 0.099
Single (W) 0.240 0.245 0.229 0.172 0.095
Single (W, TV) 0.248 0.250 0.232 0.173 0.097
With Augmentations
Duo (TV) 0.243 0.232 0.206 0.156 0.085
Duo (W,TV) 0.263 0.253 0.227 0.166 0.093
Single (TV) 0.262 0.249 0.218 0.156 0.086
Single (W) 0.262 0.254 0.233 0.181 0.105
Single (W,TV) 0.268 0.261 0.240 0.188 0.112
Single (W) 0.288 0.288 0.268 0.204 0.115
Single (W, TV) 0.292 0.292 0.272 0.206 0.115

IOU scores at different thresholds of binarization - Comparison across the different generators architectures

Then, the weights regularization between similar path and adversarial path introduced in (8) improves all the localization performance e.g. from to for Single (TV). This is consistent with the findings in V. Total variation regularization on the resulting explanation mask also slightly increases IOU and AUC scores for Single . In addition, the Single generator with two convolutional layers () performs better than the single-layer one ().
Finally, the use of augmentations during generator’s prediction improves localization scores for all cases e.g. up to 4 points for (Table VI), from 7 to 11 points for total and partial (Table VII).

Explanation method Total AUC Partial AUC
Duo (W,TV) 0.257 0.162
Single (TV) 0.253 0.157
Single (W) 0.310 0.220
Single (W, TV) 0.325 0.239
Single (W) 0.325 0.248
Single (W, TV) 0.339 0.256
With Augmentations
Duo (W,TV) 0.362 0.263
Single (TV) 0.353 0.254
Single (W) 0.370 0.274
Single (W,TV) 0.381 0.287
Single (W) 0.405 0.322
Single (W, TV) 0.412 0.328
TABLE VII: Estimated AUC scores for Precision-Recall - Comparison across the different generators architectures

When compared to state of the art methods (Tables VIIIIX), Single (W, TV) achieves comparable localization scores. Our method even slightly outperforms the best performers Mask Generator and BBMP for IOU scores for percentile thresholds from 80 to 95 %. It is also the case for both partial and total AUC compared to the best state of the art approaches: GradCAM, BBMP and Mask Generator. Only Mask Generator and Gradient outperform or compete with our method for . We can also note that the naive explanation directly defined as the difference between and (Adv. (TV)) produces much poorer results.
However, when using augmentation during generator prediction phase, our method outperforms all the others. Visual illustrations are given in Figures 3 and 4 for cases where the opacities are located either at one or two different positions. When thresholding heatmaps at the 95th percentile, our method (Single ) seems to generate less noisy masks than other approaches including the naive one (Adv ), while capturing all discriminative structures. In addition, our method is suitable for real time situation as suggests the generation time per image of the explanation given in Table IX (on NVIDIA GPU MX130).

Explanation method IOU
Percentile 80 85 90 95 98
Gradient 0.203 0.199 0.187 0.152 0.097
Smooth Grad. 0.192 0.188 0.176 0.143 0.091
Input Grad. 0.191 0.185 0.170 0.136 0.086
Integrated Grad. 0.176 0.171 0.157 0.124 0.077
GradCAM 0.237 0.225 0.195 0.138 0.070
BBMP 0.233 0.226 0.204 0.154 0.087
Perceptual Perturbation 0.133 0.125 0.110 0.080 0.045
Mask Generator 0.222 0.219 0.208 0.169 0.103
Adv. (TV) 0.177 0.173 0.158 0.118 0.064
Adversarial vs Similar
Single (W, TV) 0.248 0.250 0.232 0.173 0.097
Adv. vs Sim. + Augment.
Single (W, TV) 0.292 0.292 0.272 0.206 0.115
TABLE VIII: IOU scores at different thresholds of binarization - Comparison to State of the Art Methods

Fig. 3: Examples of explanation maps generated by different methods in case of a single ground truth bounding box annotation. Top row: the original image with the explanation map and the ground truth bounding box. Bottom row: Binary heatmaps for the percentile

Fig. 4: Explanation maps generated by different methods in case of two ground truth bounding box annotations. Top row: the original image with the explanation map and the ground truth bounding box. Bottom row: Binary heatmaps for the percentile
Explanation method Total AUC Partial AUC Time (s)
Gradient 0.287 0.189 2.04
Integrated Grad. 0.244 0.146 1.93
GradCAM 0.324 0.235 0.78
BBMP 0.326 0.229 17.14
Perceptual Perturbation 0.180 0.084 30.74
Mask Generator 0.327 0.226 0.09
Adv. (TV) 0.238 0.145 0.10
Adversarial vs Similar
Single (W, TV) 0.339 0.256 0.05
Adv. vs Sim. + Augment.
Single (W, TV) 0.412 0.328 0.63
TABLE IX: Estimated AUC scores for Precision-Recall and Computation time - Comparison to State of the Art Methods

As an additional experiment, we apply the augmentation technique to other state of the art methods that produce their visual explanation in one shot. Localization results are listed in Tables X and XI. All localization scores improve, while the generation time per image remains adequate (see Table  XI). By using augmentations, we observe for all methods a gain similar to that observed for our method. Our best method still achieves better localization results for AUC metrics. For IOU, Mask Generator outperforms our method for .

Explanation method IOU
Percentile 80 85 90 95 98
Gradient [1] 0.203 0.199 0.187 0.152 0.097
0.256 0.252 0.236 0.190 0.117
GradCAM [2] 0.237 0.225 0.195 0.138 0.070
0.271 0.263 0.244 0.190 0.105
BBMP [3] 0.233 0.226 0.204 0.154 0.087
Mask Generator [4] 0.222 0.219 0.208 0.169 0.103
0.259 0.264 0.259 0.221 0.137
”Naive” 0.177 0.173 0.158 0.118 0.064
0.239 0.230 0.208 0.156 0.087
Ours 0.248 0.250 0.232 0.173 0.097
0.292 0.292 0.272 0.206 0.115
TABLE X: IOU scores at different thresholds of binarization - Comparison to State of the Art Methods without (Top) and with (Bottom) augmentations
Explanation method Total AUC Partial AUC Time (s)
Gradient [1] 0.287 0.189 2.04
0.374 0.274 2.83
GradCAM [2] 0.326 0.235 0.78
0.397 0.302 5.09
BBMP [3] 0.326 0.229 17.14
Mask Generator [4] 0.327 0.226 0.09
0.404 0.308 0.68
”Naive” 0.238 0.145 0.10
0.325 0.232 0.75
Ours 0.339 0.256 0.05
0.412 0.328 0.63
TABLE XI: Estimated AUC scores for Precision-Recall and Computation time - Comparison to State of the Art Methods without (Top) and with (Bottom) augmentations

Vi Conclusion

In this work, we introduce a new method to produce a visual explanation of the classifier’s decision that leverages adversarial generation learning. We propose to train simultaneously a couple of generators to produce an adversarial image that goes against the classifier’s decision, and a similar image that is classified as the original one. We show that the differences between the two images as well as the learning procedure helps to better capture discriminative features. We have tested our method on a binary classification problem in the medical domain. We have shown that our method outperforms state of the art techniques in terms of weak localization, especially when we introduced geometric augmentations during the generation phase. Unlike some state of the art methods, our proposed method is both model-agnostic and sufficient for real time situation such as medical image analysis. Finally, we show that random geometric augmentations applied to the original image improves all the tested state of the art approaches.
In future works, we shall generalize our method to multi-classification problems and apply it to 3D medical image problems.


  • [1] J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim (2018-10) Sanity Checks for Saliency Maps. arXiv:1810.03292 [cs, stat] (en). Note: arXiv: 1810.03292Comment: NIPS 2018 Camera Ready Version External Links: Link Cited by: §II-A.
  • [2] M. Arjovsky, S. Chintala, and L. Bottou (2017)

    Wasserstein generative adversarial networks

    In ICML, Cited by: §III-B3.
  • [3] C. Chang, E. Creager, A. Goldenberg, and D. K. Duvenaud (2019) Explaining image classifiers by counterfactual generation. In ICLR, Cited by: §II-B.
  • [4] P. Dabkowski and Y. Gal (2017) Real time image saliency for black box classifiers. In NIPS, pp. 6967–6976. Cited by: §I, §I, §II-C, §III-A1, §IV-E.
  • [5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei (2009) ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, Cited by: §IV-B.
  • [6] A. Dhurandhar, P. Chen, R. Luss, C. Tu, P. Ting, K. Shanmugam, and P. Das (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In NeurIPS, Cited by: §II-B.
  • [7] A. Elliott, S. Law, and C. Russell (2019) Adversarial perturbations on the perceptual ball. ArXiv abs/1912.09405. Cited by: §I, §II-B, §III-A1, §III-C, §IV-E.
  • [8] A. Esteva, B. Kuprel, R. Novoa, J. Ko, S. Swetter, H. Blau, and S. Thrun (2017) Dermatologist-level classification of skin cancer with deep neural networks. In Nature, Vol. 542, pp. 115––118. Cited by: §I.
  • [9] R. C. Fong and A. Vedaldi (2017) Interpretable explanations of black boxes by meaningful perturbation. In ICCV, Vol. , pp. 3449–3457. Cited by: §I, §I, §II-B, §IV-E.
  • [10] R. Fong, M. Patrick, and A. Vedaldi (2019) Understanding deep networks via extremal perturbations and smooth masks. In ICCV, pp. 2950–2958. Cited by: §II-B.
  • [11] W. Fu, M. Wang, M. Du, N. Liu, S. Hao, and X. Hu (2019) Distribution-guided local explanation for black-box classifiers. Cited by: §IV-E.
  • [12] I. J. Goodfellow, J. Shlens, and C. Szegedy (2015) Explaining and Harnessing Adversarial Examples. In ICLR, Cited by: §I.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016) Identity mappings in deep residual networks. In ECCV, Cited by: §IV-B.
  • [14] Y. Hechtlinger (2016) Interpretation of prediction models using the input gradient. ArXiv abs/1611.07634. Cited by: §I, §IV-E.
  • [15] A. Holzinger, C. Biemann, C. S. Pattichis, and D. B. Kell (2017) What do we need to build explainable ai systems for the medical domain?. ArXiv abs/1712.09923. Cited by: §I.
  • [16] C. Hsieh, C. Yeh, X. Liu, P. Ravikumar, S. Kim, S. Kumar, and C. Hsieh (2020) Evaluations and methods for explanation through robustness analysis. ArXiv abs/2006.00442. Cited by: §II-B.
  • [17] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In CVPR, Cited by: §IV-C.
  • [18] A. Khakzar, S. Baselizadeh, S. Khanduja, S. T. Kim, and N. Navab (2019) Explaining neural networks via perturbing important learned features. ArXiv abs/1911.11081. Cited by: §II-B.
  • [19] D. P. Kingma and J. Ba (2015) Adam: A method for stochastic optimization. In ICLR, Cited by: §IV-B.
  • [20] G. Maicas, G. Snaauw, A. P. Bradley, I. D. Reid, and G. Carneiro (2019) Model agnostic saliency for weakly supervised lesion detection from breast dce-mri. In ISBI, pp. 1057–1060. Cited by: §II-C.
  • [21] D. Major, D. Lenis, M. Wimmer, G. Sluiter, A. Berg, and K. Bühler (2020) Interpreting medical image classifiers by optimization based counterfactual impact analysis. In ISBI, pp. 1096–1100. Cited by: §II-B.
  • [22] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Y. Ding, A. Bagul, C. P. Langlotz, K. S. Shpanskaya, M. P. Lungren, and A. Y. Ng (2017) CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning. ArXiv abs/1711.05225. Cited by: §II-A.
  • [23] S. Rebuffi, R. Fong, X. Ji, and A. Vedaldi (2020)

    There and back again: revisiting backpropagation saliency methods

    In CVPR, Cited by: §II-A.
  • [24] M. T. Ribeiro, S. Singh, and C. Guestrin (2016) ”Why should i trust you?”: explaining the predictions of any classifier. In ACM SIGKDD, Cited by: §II-B.
  • [25] O. Ronneberger, P. Fischer, and T. Brox (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, Cited by: §IV-C.
  • [26] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra (2017) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In ICCV, pp. 618–626. Cited by: §I, §I, §II-A, §IV-E.
  • [27] K. Simonyan, A. Vedaldi, and A. Zisserman (2014) Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In ICLR, Cited by: §I, §I, §II-A, §IV-E.
  • [28] D. Smilkov, N. Thorat, B. Kim, F. B. Viégas, and M. Wattenberg (2017) SmoothGrad: removing noise by adding noise. ArXiv abs/1706.03825. Cited by: §I, §II-A, §IV-E.
  • [29] J. T. Springenberg, A. Dosovitskiy, T. Brox, and M. A. Riedmiller (2015) Striving for simplicity: the all convolutional net. In ICLR, Vol. abs/1412.6806. Cited by: §II-A.
  • [30] M. Sundararajan, A. Taly, and Q. Yan (2017) Axiomatic attribution for deep networks. In ICML, Cited by: §I, §II-A, §IV-E.
  • [31] H. Uzunova, J. Ehrhardt, T. Kepp, and H. Handels (2019) Interpretable explanations of black box classifiers applied on medical images by meaningful perturbations using variational autoencoders. In Medical Imaging: Image Processing, Cited by: §II-B.
  • [32] J. Wagner, J. M. Köhler, T. Gindele, L. Hetzel, J. T. Wiedemer, and S. Behnke (2019) Interpretable and fine-grained visual explanations for convolutional neural networks. In CVPR, pp. 9089–9099. Cited by: §II-B.
  • [33] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers (2017) ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, pp. 3462–3471. Cited by: §IV-A.
  • [34] W. Woods, J. Chen, and C. Teuscher (2019) Adversarial explanations for understanding image classification decisions and improved neural network robustness. In Nature Machine Intelligence, Vol. 1, pp. 508–516. Cited by: §II-B.
  • [35] C. Xiao, B. Li, J. Zhu, W. He, M. Liu, and D. X. Song (2018) Generating adversarial examples with adversarial networks. In IJCAI, Cited by: §I.
  • [36] M. D. Zeiler and R. Fergus (2014) Visualizing and understanding convolutional networks. In ECCV, Cited by: §II-A, §II-B.
  • [37] W. Zhang (2019) Generating adversarial examples in one shot with image-to-image translation gan. In IEEE Access, Vol. 7, pp. 151103–151119. Cited by: §I, §I, §III-B1, §III-C, §IV-C.
  • [38] B. Zhou, A. Khosla, À. Lapedriza, A. Oliva, and A. Torralba (2016)

    Learning deep features for discriminative localization

    In CVPR, pp. 2921–2929. Cited by: §I, §II-A.