Deep neural models have enabled to reach high performances on various applications and in particular in medical image analysis. In this field, additionally to high accuracy, it is critical to provide interpretative explanations of the system decision since the clinicians’ confidence in the system is at stake. Thus, several methods were proposed to address the visual explanation problem, ranging from saliency maps[27, 28, 14, 30] and class activation mapping methods[38, 26] to perturbation maps[9, 4]. However the problem is still open since there is no general consensus on their performances. Independently, under the motivation of model safety, several methods were proposed to generate ”fake” images that ”fool” classification algorithms. Such ”fooling” images are generated either by the addition of a small perturbation on top of the input image[12, 7, 35], or as a complete new image, very close to the input. Most of these works point out that adversarial generation allows to study the model robustness and fragility but very few make links with the explainability problem.
In this work, we propose to leverage adversarial generation methods to produce interpretative explanation of classifier’s decision. Inspired by  and  who both train model to generate respectively explanation masks and adversaries, we propose to train a model to generate images that capture discriminative structures with the following key contributions:
A new framework of explanation is proposed. We define the explanation of a classifier’s decision as the difference between a regularized adversarial example and the ”projection” of the original image into the space of adversarial generated samples.
We introduce a new optimization workflow that combines the training of an adversarial generator and of a similar generator that ”projects” the original image into the adversarial space.
We propose a new method that greatly improves the use of several explanation methods as means to localize decisive objects in the image: Namely, averaging of registered explanation results built upon random geometric augmentations of the input image.
We validate our algorithm on a binary classification problem (pathological/healthy) of a large database of X-ray chest images. We compare our technique to state of the art approaches such as Gradient, GradCAM, BBMP, Mask Generator.
Ii Related Work
Explaining classifiers decisions via some visual map has been the subject of several contributions. We here classify in three categories a selected few.
Ii-a Back-propagation-based methods
These methods leverage, for neural networks and for a given image, back-propagation of small variations of the model’s prediction. While providing interesting results, these methods tend to produce noisy explanation maps since any variations of the model’s output is considered. Many contributions are thus focused on building sharper and smoother explanation maps[36, 29, 28, 30]. In, explanation maps are produced by upsampling activations from the last convolutional layer to the input image size, GradCAM (or its application in medical domain), builds on this work by computing the gradient of the output with respect to the last convolutional layer (and not only with respect to the model’s prediction) generating compelling results. For an exhaustive review, the reader can refer to  and . As a limitation, these approaches are not model-agnostic (limited to neural networks) and need access to intermediate layers.
Ii-B Iterative perturbation-based methods
The principle of this approaches is to exploit the effects of perturbations to the input image
on the model’s output.
For instance, LIME proposes a local explanation by perturbing random segments of the input image and training
a linear classifier to predict the importance of each segment for the classifier’s prediction.
The authors of take this idea further by defining their explanation maps as the result of an optimization procedure over the input image and the model to explain. Considering a fixed perturbation, their approach consists in learning the maps that maximally impact the model or on the contrary the maps that enable to preserve its performance. Similarly, building on  for the medical imaging domain,[31, 21] adopt a similar optimization formulation but focus on the perturbations. They use generative methods to, respectively, perturb pathological images by local in-painting healthy tissues within pathological images or completely reconstructing a healthy image.
As noisy outputs is a major concern within these methods, some authors focus on regularization terms  and others filter gradients during back-propagation . Different optimization formulations were also introduced[6, 16]. In, an explanation is generated through features perturbation at different levels of the neural network. In[34, 7] the optimization problem is revisited as an adversarial example generation, where the adversarial perturbation is sought in a regularized and restricted space.
Note that all these methods have in common the necessity to solve, for each image, an optimization problem in order to produce an explanation map. This translates into a high computational cost, as several iterations are needed for convergence (often inappropriate when a real time response is expected). Moreover, since the optimization problem is solved for every image, over-fitting issues arise. Explanation maps often contain features not linked to the models’ behaviour but only to the image being processed. Strong and elaborated regularization is thus a necessity.
Ii-C Trained perturbation-based methods
In order to alleviate computational needs of iterative perturbation-based methods,  proposes to evolve to an optimization problem on the whole database, thus learning a masking model. In medical imaging,
uses the same approach on a single class problem. Despite the benefits of this optimization strategy, two main drawbacks remain. First, perturbations are provided as parameters to be set and adapted manually. Their choice is impacted not only by the database and the considered classifier but also by the training of this classifier (e.g. a random noise perturbation has no effects on models trained to be robust to this noise). Since perturbations are manually selected, residual adversarial artefacts (without any link to the explanation) are still generated. Second, a costly hyperparameters tuning is needed to control the size of the generated explanation masks.
We present our methodology to generate explanations for image classification outcomes. As for the methods exposed in section II, our explanation is given as a visual explanation map where higher values code for more important areas on the image w.r.t to the classifier decision. For the sake of simplicity, we present the rationale behind our approach in the case of a binary classification problem, the extension to the multi-class case being presented in section III-C. Let be the studied classifier outputting a classification score in the range . Without loss of generality, we assign label 1 (resp. label 0) to an input image if its classification score (
value) is over 0.5 (resp. under 0.5). In the case of a different threshold one can apply for instance a piece-wise linear transformation toto satisfy this condition.
Iii-a Explanation through similar and adversarial generations
Iii-A1 Adversarial naive formulation
A naive, yet novel, approach to reach our objective is to combine a trainable masking model  with adversarial perturbations for visual explanations. The visual explanation map is then given by the difference between the input image and its adversary. This method is no longer dependent on the choice of a perturbation function since the adversarial sample ”learns” this perturbation. For an input image we define the naive explanation as
where is a model obtained via a training process with the goal of ”fooling” the classifier while producing an image ”very close” to , written as follows:
The mean value is taken over a training data set. Generating a visual explanation using (1) and (2) effectively counterbalances drawbacks of  &  but despite the regularization expected from the learning process, visual explanations are often corrupted by noise, highlighting regions which clearly should have no impact on the classifiers decision (see section V-B).
Iii-A2 Similar-Adversarial formulation
Why does the naive formulation generate incoherent visual explanations? We argue that the flaw resides in the definition of explanation as expressed in equation (1). Comparing the original image with its generated adversarial sample exposes the method to a risk of reconstruction error. Some details of the original image can be absent from the generated adversarial sample. However these details are not discriminative for the classifier in the sense that their sole presence would not change the classification score. More formally, the adversarial sample belongs to the target space of () which is different from the space of original images (). The comparison between and inherits from the differences between and and these differences are not explicitly linked to the explanation problem by Equation (2). Since we do not have control on the original image space , we propose to mitigate this reconstruction risk by defining the explanation mask as the difference between the adversary and the closest element to in on which returns the same value as . We call this element the similar image and it is denoted by . is the function mapping images to their similar counterparts. The rationale is to reduce the reconstruction error so that only contains values related to the classifiers’ decision and reads
Denoting the target space of , both and are built through a joint optimization process aiming to make and as ”close” as possible while satisfying and :
where , and are distance functions between elements of the different image spaces while is a measure between the two functions. Henceforth, we refer to and as the similar and adversarial generators respectively.
Iii-B Weaker formulation: Objective function
We propose to solve a weak formulation of the previous constrained optimization problem (4). We search for both similar and adversarial generators as minimizers of the following unconstrained problem
is a similarity loss that accounts for the term in equation (4) and enforces the proximity between , and . , the classification loss, is a weak formulation of the classification constraints in (4) enforcing the similarity between and and their dissimilarity with . enforces the similarity between and (). In addition to the terms of (4), acts on the difference () to enforce regularity. An embodiment of optimization problem (5) when using neural networks is given in Figure 1 (see section IV-C). We next specify the choices made in our method for each of the terms in Equation (5).
Iii-B1 Similarity Loss
Is defined as
where parameters adjust the importance attached to the different terms. Combining and norms to enforce similarity between and produces better results experimentally (as in ).
Iii-B2 Classification Loss
Is defined as
where are weighting parameters and is the binary cross entropy loss. This term accounts for the weak formulations of constraints in (4), favoring classifier to act on as it acts on and in the opposite manner for .
Iii-B3 Generator Loss
Is a measure of the distance between the two generators. In the particular case (see section IV-C) where they both are neural networks (parameterized by and respectively) we used the following metric
where we assume generators and to have the same architecture. is a weighting parameter. Note that metrics used in GANs to measure discrepancies between distributions may also be considered.
Iii-B4 Regularization Loss
Iii-C Multi-class situation
Weak optimization problem (5) can be adapted to the multi-class problem by modifying
to account for a vector valuedfunction. This boils down to modifying term (7) adapting CW loss of[7, 37] into
where index is defined by corresponding to the class selected by the classifier on input . is a strictly positive margin.
Iii-D Explanation and augmentations
As our visual explanation is defined as the difference between two generated images, we suggest to regularize the output of our explanation method by averaging all outputs on random geometrical transformations of the input image. Thus, discriminative regions against reconstruction errors are further enforced. This average reads:
where are random geometric transformations such as rotations, translations, zoom, axis flip, etc. This particular regularization can be applied to all other visual explanation techniques (see section V-B).
In the following sections, we denote by , the output of similar and adversarial generators respectively.
We tested our approach on a publicly available Chest X-rays dataset for a binary classification problem. The Chest X-rays dataset comes from the RSNA Pneumonia Detection Challenge dataset which is a subset of 26,684 exams in dicom taken from the NIH CXR14 dataset. We only extracted healthy and pneumonia cases from the original dataset. The resulting database is composed of 14,863 exams: 6,012 pneumonia - 8,851 healthy. We split the dataset into 3 random groups (80%, 10%, 10%) : train (11,917) - validation (1,495) - test (1,451). X-rays exams with opacities contain bounding box ground truth annotations.
Iv-B Classifier Set Up
The classification model whose decisions need to be explained consists of a ResNet50
. We adapt the last layers of the ResNet50 network in order to tackle a binary classification task (healthy/pathology). We transfer the pre-trained backbone layers from Imagenet
to our binary classifier. Then, the network is trained on the whole training set for 50 epochs with a batch size of 32. We use the Adam optimizer with an initial learning rate of 1e-4. Original X-rays are resized from 1024x1024 to 224x224 and normalized to . We also used zoom, translations, rotations and vertical flips as random data augmentations. The binary classifier achieves an AUC of 0.974 on the test set.
Iv-C Generative Explanation Model
For the similar and adversarial generators, as in[17, 37], generators roughly follows the UNet architecture. We propose two different types of generators: (i) Duo (Figure 1 - Top): and are two separated UNet auto-encoders. (ii) Single (Figure 1 - Bottom): andindicates the number of convolutional layers in the separated CNN.
Generators take as input the same image as the classifier with 3 channels and dimensions 224x224. Both generators are trained simultaneously for 70 epochs with a batch size of 8 for Single and 4 for Duo
, with the same augmentations used for the classifier. Adam optimizer is used with an initial learning rate of 1e-4, and we reduce the learning rate by 3 each time the loss does not decrease after 3 epochs. Through trial an error we selected the objective loss function (5) parameters providing the best results and summarized them in Table I.
|Single (W, TV)||3||1||1||0.2||0.001||0.2||0.2|
Iv-D Augmentation during generator’s prediction
During generator’s prediction, for each image , we generate 10 augmented images with random geometric transformations of parameters described in Table II
|Rotations range (|
|Height shift range (pixels)|
|Width shift range (pixels)|
|Random horizontal flip||(True, False)|
|Random vertical flip||(True, False)|
Iv-E Method Evaluation
The evaluation is achieved on the classifier’s test set. For similar and adversarial generators and , we respectively evaluate the similarity between , and
. Structural Similarity Index (SSIM), as well as the Peak Signal to Noise Ratio (PSNR) are used to evaluate the similarity between pair of images. For the classification purpose, we compute the area under the ROC curve between the rounded value of the classifier predictions(resp. ) and (resp. ).
In state of the art methods and more specifically in medical imaging, a visual explanation is considered as interpretable if: (i) The highlighted regions coincide with discriminative regions for humans. In our classification problem, salient regions should overlap opacity regions where the pathologies are found. (ii) The highlighted regions coincide with context regions that are also discriminative for humans. We can quantitatively assess the overlap between explanation map and ground truth annotations by conducting a weak localization experiment. We use two metrics to evaluate the localization performance: the intersection over union (
) and an estimated area under the curve (). We compute the intersection over union between the ground truth mask and the thresholded explanation mask , as defined in (12).
where is the binary mask included inside the ground truth bounding box annotation, and is the binary mask obtained when we threshold the explanation mask at the -th percentile :
We also measure the precision and the sensitivity of the localization for different thresholds
in order to compute the area under the precision and recall curve as introduced in:
where , and .
Our estimation of differs from as we only compute the metrics over the hundred values of percentile instead of all sorted values of the explanation map.
We also compute a partial for percentiles between and as it is more representative of the volume occupied by the ground truth mask . We show some statistics of the bounding box annotations in Table III.
We compare our proposed method to the naive one (see section III-A) and to the following state of the art approaches: Gradient, Smooth-Gradient, Input Gradient, Integrated Gradient, GradCAM, BBMP, Mask Generator and Perceptual Perturbation . The best BBMP results are reached when looking for a mask at 56x56 and with Gaussian blur perturbation. The mask Generator follows the UNet architecture described in
, but we remove the class selector and adapt the objective function to a single class problem. The best results are obtained when we generate a mask at size 112x112 and then upsample it to image dimensions. For Perceptual perturbation which is not model-agnostic, we regularize the first ReLU layer of each convolution block of the ResNet50 classifier. We also adapt the optimization to a single class problem.
|Metrics||Height (pixels)||Width (pixels)||Area Ratio (%)|
V Results & Discussion
V-a Generator evaluation
For the different architectures and optimization tested, both generators and reach high performance in term of classification.
As shown in Table IV, similar images are almost all classified as the original ones, as the almost reaches .
Adversarial images achieve better adversarial attacks either when the network (Single) or the weights regularization (W) causes the generators and to be close to each other (Table IV).
They even outperform the naive approach (Adv. (TV)) where is trained without .
|Single (W, TV)||0.998||0.952|
For the similarity, both generators produce samples visually highly similar to original images (see Figure 2 and Table V). Similar images best perform for both SSIM and PSNR when generators are not constrained by weight regularization. At the opposite, adversarial images increase their similarity to both and when generators are constrained, and it even outperforms the naive adversarial generator trained on its own. In our case, the objective is to produce and as close as possible in order to reduce non discriminative differences, while having very close to . As shown in Table V, Single regularized with weights proximity (W) produce highly similar samples and , while maintaining a strong similarity between and .
|Single (W, TV)||0.995||43.88||0.994||42.63||0.999||51.93|
V-B Weak localization evaluation
As shown in Table III, bounding box annotations of the test set occupy from 0.5 to 25.3 % of the image with an average occupation of 7.3%. For different generators and regularizations, we accordingly list the results of the averaged IOU for between the 80th and 100th percentile value in Table VI, and total and partial in Table VII. Firstly, Single clearly outperforms the Duo version for all IOU and AUC scores. The Single approach compelled and to capture the same information on the original image by sharing a common autoencoder. As shown in Table V, the proximity between and as well as between and is better for Single approaches.
|Single (W, TV)||0.248||0.250||0.232||0.173||0.097|
|Single (W, TV)||0.292||0.292||0.272||0.206||0.115|
IOU scores at different thresholds of binarization - Comparison across the different generators architectures
Then, the weights regularization between similar path and adversarial path introduced in (8) improves all the localization performance e.g. from to for Single (TV).
This is consistent with the findings in V.
Total variation regularization on the resulting explanation mask also slightly increases IOU and AUC scores for Single .
In addition, the Single generator with two convolutional layers () performs better than the single-layer one ().
Finally, the use of augmentations during generator’s prediction improves localization scores for all cases e.g. up to 4 points for (Table VI), from 7 to 11 points for total and partial (Table VII).
|Explanation method||Total AUC||Partial AUC|
|Single (W, TV)||0.325||0.239|
|Single (W, TV)||0.339||0.256|
|Single (W, TV)||0.412||0.328|
When compared to state of the art methods (Tables VIII, IX), Single (W, TV) achieves comparable localization scores.
Our method even slightly outperforms the best performers Mask Generator and BBMP for IOU scores for percentile thresholds from 80 to 95 %.
It is also the case for both partial and total AUC compared to the best state of the art approaches: GradCAM, BBMP and Mask Generator.
Only Mask Generator and Gradient outperform or compete with our method for .
We can also note that the naive explanation directly defined as the difference between and (Adv. (TV)) produces much poorer results.
However, when using augmentation during generator prediction phase, our method outperforms all the others. Visual illustrations are given in Figures 3 and 4 for cases where the opacities are located either at one or two different positions. When thresholding heatmaps at the 95th percentile, our method (Single ) seems to generate less noisy masks than other approaches including the naive one (Adv ), while capturing all discriminative structures. In addition, our method is suitable for real time situation as suggests the generation time per image of the explanation given in Table IX (on NVIDIA GPU MX130).
|Adversarial vs Similar|
|Single (W, TV)||0.248||0.250||0.232||0.173||0.097|
|Adv. vs Sim. + Augment.|
|Single (W, TV)||0.292||0.292||0.272||0.206||0.115|
|Explanation method||Total AUC||Partial AUC||Time (s)|
|Adversarial vs Similar|
|Single (W, TV)||0.339||0.256||0.05|
|Adv. vs Sim. + Augment.|
|Single (W, TV)||0.412||0.328||0.63|
As an additional experiment, we apply the augmentation technique to other state of the art methods that produce their visual explanation in one shot. Localization results are listed in Tables X and XI. All localization scores improve, while the generation time per image remains adequate (see Table XI). By using augmentations, we observe for all methods a gain similar to that observed for our method. Our best method still achieves better localization results for AUC metrics. For IOU, Mask Generator outperforms our method for .
|Mask Generator ||0.222||0.219||0.208||0.169||0.103|
|Explanation method||Total AUC||Partial AUC||Time (s)|
|Mask Generator ||0.327||0.226||0.09|
In this work, we introduce a new method to produce a visual explanation of the classifier’s decision that leverages adversarial generation learning.
We propose to train simultaneously a couple of generators to produce an adversarial image that goes against the classifier’s decision, and a similar image that is classified as the original one.
We show that the differences between the two images as well as the learning procedure helps to better capture discriminative features. We have tested our method on a binary classification problem in the medical domain.
We have shown that our method outperforms
state of the art techniques in terms of weak localization, especially when we introduced geometric augmentations during the generation phase. Unlike some state of the art methods, our proposed method is both model-agnostic and sufficient for real time situation such as medical image analysis. Finally, we show that
random geometric augmentations applied to the original image improves all the tested state of the art approaches.
In future works, we shall generalize our method to multi-classification problems and apply it to 3D medical image problems.
-  (2018-10) Sanity Checks for Saliency Maps. arXiv:1810.03292 [cs, stat] (en). Note: arXiv: 1810.03292Comment: NIPS 2018 Camera Ready Version External Links: Cited by: §II-A.
Wasserstein generative adversarial networks. In ICML, Cited by: §III-B3.
-  (2019) Explaining image classifiers by counterfactual generation. In ICLR, Cited by: §II-B.
-  (2017) Real time image saliency for black box classifiers. In NIPS, pp. 6967–6976. Cited by: §I, §I, §II-C, §III-A1, §IV-E.
-  (2009) ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, Cited by: §IV-B.
-  (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In NeurIPS, Cited by: §II-B.
-  (2019) Adversarial perturbations on the perceptual ball. ArXiv abs/1912.09405. Cited by: §I, §II-B, §III-A1, §III-C, §IV-E.
-  (2017) Dermatologist-level classification of skin cancer with deep neural networks. In Nature, Vol. 542, pp. 115––118. Cited by: §I.
-  (2017) Interpretable explanations of black boxes by meaningful perturbation. In ICCV, Vol. , pp. 3449–3457. Cited by: §I, §I, §II-B, §IV-E.
-  (2019) Understanding deep networks via extremal perturbations and smooth masks. In ICCV, pp. 2950–2958. Cited by: §II-B.
-  (2019) Distribution-guided local explanation for black-box classifiers. Cited by: §IV-E.
-  (2015) Explaining and Harnessing Adversarial Examples. In ICLR, Cited by: §I.
-  (2016) Identity mappings in deep residual networks. In ECCV, Cited by: §IV-B.
-  (2016) Interpretation of prediction models using the input gradient. ArXiv abs/1611.07634. Cited by: §I, §IV-E.
-  (2017) What do we need to build explainable ai systems for the medical domain?. ArXiv abs/1712.09923. Cited by: §I.
-  (2020) Evaluations and methods for explanation through robustness analysis. ArXiv abs/2006.00442. Cited by: §II-B.
-  (2017) Image-to-image translation with conditional adversarial networks. In CVPR, Cited by: §IV-C.
-  (2019) Explaining neural networks via perturbing important learned features. ArXiv abs/1911.11081. Cited by: §II-B.
-  (2015) Adam: A method for stochastic optimization. In ICLR, Cited by: §IV-B.
-  (2019) Model agnostic saliency for weakly supervised lesion detection from breast dce-mri. In ISBI, pp. 1057–1060. Cited by: §II-C.
-  (2020) Interpreting medical image classifiers by optimization based counterfactual impact analysis. In ISBI, pp. 1096–1100. Cited by: §II-B.
-  (2017) CheXNet: radiologist-level pneumonia detection on chest x-rays with deep learning. ArXiv abs/1711.05225. Cited by: §II-A.
There and back again: revisiting backpropagation saliency methods. In CVPR, Cited by: §II-A.
-  (2016) ”Why should i trust you?”: explaining the predictions of any classifier. In ACM SIGKDD, Cited by: §II-B.
-  (2015) U-net: convolutional networks for biomedical image segmentation. In MICCAI, Cited by: §IV-C.
-  (2017) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In ICCV, pp. 618–626. Cited by: §I, §I, §II-A, §IV-E.
-  (2014) Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In ICLR, Cited by: §I, §I, §II-A, §IV-E.
-  (2017) SmoothGrad: removing noise by adding noise. ArXiv abs/1706.03825. Cited by: §I, §II-A, §IV-E.
-  (2015) Striving for simplicity: the all convolutional net. In ICLR, Vol. abs/1412.6806. Cited by: §II-A.
-  (2017) Axiomatic attribution for deep networks. In ICML, Cited by: §I, §II-A, §IV-E.
-  (2019) Interpretable explanations of black box classifiers applied on medical images by meaningful perturbations using variational autoencoders. In Medical Imaging: Image Processing, Cited by: §II-B.
-  (2019) Interpretable and fine-grained visual explanations for convolutional neural networks. In CVPR, pp. 9089–9099. Cited by: §II-B.
-  (2017) ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In CVPR, pp. 3462–3471. Cited by: §IV-A.
-  (2019) Adversarial explanations for understanding image classification decisions and improved neural network robustness. In Nature Machine Intelligence, Vol. 1, pp. 508–516. Cited by: §II-B.
-  (2018) Generating adversarial examples with adversarial networks. In IJCAI, Cited by: §I.
-  (2014) Visualizing and understanding convolutional networks. In ECCV, Cited by: §II-A, §II-B.
-  (2019) Generating adversarial examples in one shot with image-to-image translation gan. In IEEE Access, Vol. 7, pp. 151103–151119. Cited by: §I, §I, §III-B1, §III-C, §IV-C.
Learning deep features for discriminative localization. In CVPR, pp. 2921–2929. Cited by: §I, §II-A.