What do Deep Neural Networks Learn in Medical Images?

by   Beentherize, et al.

Deep learning is increasingly gaining rapid adoption in healthcare to help improve patient outcomes. This is more so in medical image analysis which requires extensive training to gain the requisite expertise to become a trusted practitioner. However, while deep learning techniques have continued to provide state-of-the-art predictive performance, one of the primary challenges that stands to hinder this progress in healthcare is the opaque nature of the inference mechanism of these models. So, attribution has a vital role in building confidence in stakeholders for the predictions made by deep learning models to inform clinical decisions. This work seeks to answer the question: what do deep neural network models learn in medical images? In that light, we present a novel attribution framework using adaptive path-based gradient integration techniques. Results show a promising direction of building trust in domain experts to improve healthcare outcomes by allowing them to understand the input-prediction correlative structures, discover new bio-markers, and reveal potential model biases.


page 14

page 15

page 18

page 19

page 20


Using StyleGAN for Visual Interpretability of Deep Learning Models on Medical Images

As AI-based medical devices are becoming more common in imaging fields l...

A Tour of Unsupervised Deep Learning for Medical Image Analysis

Interpretation of medical images for diagnosis and treatment of complex ...

Two layer Ensemble of Deep Learning Models for Medical Image Segmentation

In recent years, deep learning has rapidly become a method of choice for...

Using Attribution to Decode Dataset Bias in Neural Network Models for Chemistry

Deep neural networks have achieved state of the art accuracy at classify...

Jekyll: Attacking Medical Image Diagnostics using Deep Generative Models

Advances in deep neural networks (DNNs) have shown tremendous promise in...

The Security of Deep Learning Defences for Medical Imaging

Deep learning has shown great promise in the domain of medical image ana...

1 Introduction

The confluence of advances in compute and deep model architectures Rumelhart et al. (1986); He et al. (2016); Chollet (2017); Krizhevsky et al. (2012); Simonyan and Zisserman (2014) has offset a stream of research in automated medical image analysis in the recent past Shen et al. (2017). Bioimaging techniques such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Functional Magnetic Resonance Imaging (fMRI), Positron Emission Tomography (PET), Mammography, Ultrasound, and X-ray have been predominately interpreted by radiologists and physicians for timely detection, diagnosis, and treatment of diseases Litjens et al. (2017). However, the healthcare industry is an ever-changing field that requires extensive training as there exist wide variations to pathologies that keep evolving. Due to the high demand for skilled labor, human experts potentially experience fatigue which necessitates computer-aided diagnostic tools. Thus, the maturation of deep learning is accelerating the intervention of computer-assisted tools for human experts, doctors, and researchers to reduce the labor-intensive and time-consuming work of manual medical image analysis.

Deep learning is poised to dramatically democratize healthcare, especially in the Global South where expertise in medical image analysis remains inadequate and prohibitively expensive Murtaza et al. (2020). But the success or failure of adopting and using these systems in clinical settings profoundly hinges on the assured trust of stakeholders in the robustness and interpretability of inference mechanisms of these deep models that are crucial in safety-critical sectors like healthcare Reyes et al. (2020).

Despite the inherent complexity of deep learning models, we present in this work a roadmap toward understanding the inference mechanisms that lead to predictions using adaptive path-based integrated gradient techniques. We have systematically studied and experimented with a class of these techniques using standard state-of-the-art convolutional neural network (CNN) architectures to classify brain tumors from MRI with regions of interest segmented and verified by medical experts. These attribution techniques give information about salient features in the input that corresponds to a specific predicted class. They can improve model understanding, build trust, and lead to system verification by clinical experts to adopt deep learning-based computer-aided diagnostic tools.

The remaining outline of this paper is as follows: Section 2 discusses related interpretability approaches in deep learning-based medical image analysis. Section 3 describes the adopted methodology and the proposed approach. Section 4 discusses the dataset and explains the experimental results, and Section 5 concludes the work and present future research directions.

2 Related Literature

Varied interpretability methods have been recently proposed for medical image analysis tasks. Research in this direction is growing primarily to help build trustworthy artificial intelligence (AI) systems that use a human-in-the-loop approach to complement domain experts. Concept Learning techniques have been used in

Koh et al. (2020); Sabour et al. (2017); Shen et al. (2019) to manipulate high-level concepts to train models that can perform multi-stage predictions from high-level clinical concepts which provide input to the final classification task of disease categories. However, these methods have a significant annotation costs, and concept-to-task mismatches can lead to considerable information leakage Salahuddin et al. (2022).

Another class of technique is Case-Based Models, where class discriminative disentangled representations and feature mappings are learned, and the final classification is performed by measuring the similarity between the input image and the base templates Bass et al. (2020); Kim et al. (2021); Li et al. (2018). But this class of techniques is not susceptible to corruption by noise and compression artifacts. It is also difficult to train models using this paradigm. Counter Factual Explanation is another approach where input medical images are perturbed in pseudo-realistic ways to generate an opposite prediction. They have the problem of generating unrealistic perturbations with respect to the input images which can often be low resolutions as opposed to the original images Baumgartner et al. (2018); Cohen et al. (2021); Lenis et al. (2020); Schutte et al. (2021); Seah et al. (2019); Selvaraju et al. (2017); Simonyan et al. (2013); Singla et al. (2021). Visualization of the internal network representation of learned features of kernels in CNNs is another technique that is used in model understanding. But this approach has a limitation of difficulty in interpreting feature maps in medical image analysis settings Bau et al. (2017); Natekar et al. (2020).

An attribution map provides a post-hoc explanations whereby regions of the input image are highlighted as indicated saliency method based on the model prediction. In their paper, Böhle et al. (2019) proposed Layer-wise relevance propagation for explaining deep neural network decisions in MRI-based Alzheimer’s disease classification. A deep CNN-based model with Gradient Class Activation Map (Grad-CAM) was trained to classify oral lesions for clinical oral photographic images Camalan et al. (2021). In Kermany et al. (2018), a similar CNN-based Grad-CAM technique for the classification of Oral Dysplasia is proposed. However, our approach is different from Böhle et al. (2019); Camalan et al. (2021); Kermany et al. (2018) as we utilize adaptive path-based integrated gradients techniques to address the problem of noisy saliency masks which hinders former methods Kapishnikov et al. (2021).

3 Proposed Method

We present the CNN models utilized to carry out experiments in this study for the classification task. Characterizations of these CNN architectures are presented, indicating their inductive prior, strengths, and limitations in learning visual representations. We give a detailed description of the adaptive path-based integrated gradients techniques and their direct applications to deep learning-based models in medical image analysis. To achieve this, we have summarized the mathematical notation convention in Table 1 used in this work.

Notation Description
Set of real numbers
Set of

-dimensional real-valued vector

Set of real-valued matrix
Set of

real-valued tensor which is a single channel image input to a neural network

A corresponding one-hot encoded label for an image input

Cardinality of the set of medical image classes.
The kernels for the -th layer of a CNN

A loss function

Non-linear transformation of input

at layer given a parameter
Activation function at layer

Non-negative real-valued regularization hyperparameter

The squared norm
and A training and testing samples of task respectively. is sampled from the distribution of task
A neural network that produces latent representation for each input
An attribution operator that takes a trained model to produce a saliency map
Computed saliency map for a given input image
Table 1: Summary of the mathematical notations in this paper.

3.1 Background

We have utilized 8 standard CNN architectures: Visual Geometric Group (VGG 16 and VGG 19 Simonyan and Zisserman (2014)), Deep Residual Network-50 (ResNet-50, ResNet-50V2) He et al. (2016), Densely Connected Convolutional Networks Huang et al. (2017), Deep Learning with Depthwise Separable Convolutions (Xception) Chollet (2017)), Going deeper with convolutions (Inception)Szegedy et al. (2015), and Efficientnet: Rethinking model scaling for convolutional neural networks Tan and Le (2019)

for classifying brain tumors from the T1-weighted MRI slices. The choice of these deep models is explained by the fact that they are widely used in medical image features extraction for prediction and/or classification.

VGG was first introduced in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014 challenge

Russakovsky et al. (2015) mainly to evaluate the effect of increasing depth in a deep neural network architecture with very small (

) convolution kernels. The results showed that increasing depth to 16-19 weight layers is a significant factor in improving the prior-art configurations. Increment in neural architectural depth leads to more expressive models that learn better representations, thus, improve generalizations across training tasks. However, deeper networks are hard to train because of the vanishing gradient problem

Hochreiter (1991); Bengio et al. (1994); Glorot and Bengio (2010). In that regard, the deep residual learning; ResNets was introduced in He et al. (2016) to facilitate training routines for massively deeper neural networks. Results inHe et al. (2016)

empirically showed that ResNets converge faster using local search methods such as stochastic gradient descent (SGD) and can achieve higher accuracy from considerably increased depth of several layers. The primary way the vanishing gradient problem is tackled in this framework is by introducing identity mappings that create shortcut connections to maximally exploit information flow in the network architecture thus solving the vanishing gradient problem. As depth is addressed by the residual network framework, another key concern is how wide can we go and in what variety of kernel sizes?

Thus, a natural solution would be to learn, within computational limits as many factors of variations as possible. This is the main idea introduced in the depth-wise separable layers based on the Inception architecture Szegedy et al. (2015). In contrast to a standard Inception model that performs cross-channel correlations followed by spatial correlations, in the Xception model, spatial convolutions are performed independently Chollet (2017). This consists of a spatial convolution performed independently for each channel of the input followed by a point-wise convolution across channels for dimensionality reduction of the computed features. In their work Huang et al. (2017), introduced the idea of dense connectivity; DenseNet where each layer is connected to every other layer in a feed-forward fashion in neural networks. Their approach is a natural extension of the successes made by ResNets. A DenseNet comprises dense blocks which implement dense connectivity to reduce the computational cost of channel-wise feature concatenation. This architectural design is robust to gradient flow as it provides robust signals for gradient propagation in the layers of a substantially deeper network which results in gainful generalization performance. With a small growth rate, this architectural design is computationally efficient. The EfficientNet Tan and Le (2019) introduced a principled study of model scaling considering the impact of depth, width, and resolution on model performance. A new compound scaling method was proposed that uniformly scales all three dimensions of an input image: depth, width and resolution using a compound coefficient that is derived from a grid search method.

The above architectures as described are known in the context of supervised deep learning for which the optimization uses gradient-based local search methods. The goal of the optimization is to find an optimal fitted function that minimizes the empirical risk; measured from the training samples with a defined loss function :


where compacts the parameters of the trainable neural network , N the number of training examples, and associated are the features vector and label for sample respectively. To prone generalization, a regularization term is imperatively added


in the norm regime with the learning rate. In order words for with , at layer

we want to interpolate

such that


predicts the label for . In this notation, is the output interpolation of layer , is the activation function at layer , is the learnable parameters at layer with and

the weight matrix and bias vector respectively. In the expression in Equation

3 the weights matrix is introduced as a sort of regularization that activates the connections which contribute to the interpolation of at layer ; this is known as the dropout regularization.

Adopting a gradient flow training method with variable learning rate at layer , in the meta learning regime as we adopted in this work, the update of follows two procedures. If is assumed to be the distribution of task where each task is sampled as with the aim to learn prior knowledge from all these . As discussed in Finn et al. (2017) the main goal is to encapsulate the prior knowledge of all as the initial weight of the fitted function which can now be used as initial weight for quick adaptation to a new task. The first attempts is to find the parameter of a task with training sample ; where is the number of sample in . At the iteration, is updated as:


which is now followed by a proper update of using the direction of the gradient and the test samples of the task ; where is the number of sample in . Assume that is obtained after several update as discussed in Equation 4 for each task , the proper update of follows:


where and are the number of tasks and the learning rate at layer respectively.

3.2 Proposed Visual Interpretability Framework

To help interpret a model inference mechanism, which is crucial in building trust for clinical adoption of deep learning-based computer-aided diagnostic systems, we have proposed an interpretability framework depicted in Figure 1 that gives an overview of an attribution mechanism. Sundararajan et al. (2017) posited fundamental axioms: Sensitivity and Implementation Invariance that attribution methods must satisfy are adhered for all selected saliency methods in this study. For a macro-scale attribution, a model that has learned statistical regularities of any given bioimaging dataset that has an arbitrary number of classes to produce a representation for each medical image slice

that is a compact latent representation in a vector space. With this representation, any arbitrary dimensionality reduction method can map the latent representation onto a lower-dimensional space for analysis and visualization. This could be a Gaussian Mixture Model (GMM)

Duda et al. (1973), t-Distributed Stochastic Neighbor Embedding (t-SNE) Van der Maaten and Hinton (2008)

or Principal Component Analysis (PCA)

Wold et al. (1987) technique to understand the latent space projection.

To attain local information about an attribution scheme because of the limitations of global attribution as it does not give contextual information of feature importance in the input space. We therefore, propose the use of gradient information since neural models are differentiable functions. We propose a framework of an adaptive path-based gradient integration method that utilizes the Guided Integrated Gradient (GIG) Kapishnikov et al. (2021) as shown in Equation 8 and a region-based saliency method (XRAI) Kapishnikov et al. (2019). The core idea of Integrated Gradient (IG) is that given a non-linear differentiable function defined as:


which represents a deep neural network and an input . A general attribution of the prediction at the input relative to some baseline input is a vector where is the contribution of the vector component to the function . In a medical image analysis context, the function represents a deep neural network that learns a disentangled non-linear transformation of given medical image slices. The input vector is a simple tensor of the mono channel image, where the indices correspond to pixels. The attribution vector serves as a mask over the original input to show the regions of interest of the model for the given predicted score. This information gives us insight into regions of interest for any given 2D image slice:


where is the difference between the input image and the corresponding baseline input at each pixel.

Figure 1: A dataset of samples of T1-weighted contrast-enhanced images slices is the input to a standard CNN classification model depicted in the figure as that learns the non-linear mapping of the features to the output labels. is utilized with an attribution operator to attribute salient features of the input image. is an operator that can be used with varied differentiable architectures. This proposed framework is general and can be applied to any problem instances from where explainability is vital in building trust in model inference mechanism.

Computing and visualizing the saliency maps involve the following steps:

  1. We initialize a baseline with all zero. This baseline input remains prediction-neutral and has a crucial role in the interpretation and visualization of the input pixel feature importance.

  2. Linear interpolations are generated between the baseline and the original image that are incremental steps in the feature space between the baseline and the input image .

  3. The gradient in Equation 8 is computed to measure the relation between the features

    and changes in the model class predictions. It gives a criterion for pixels with the most relevance to the model class probability scores. This gives a basis for quantifying feature importance in the input image with respect to the model prediction.

  4. Using a summation method, an aggregate of the gradients is computed.

  5. The aggregated saliency mask is scaled to the input image to ensure feature attribution values are accumulated across multiple interpolated images that are all on the same scale that represents the saliency map on the input image with the pixel feature saliency.

4 Experimental Results

This section presents an overview of the dataset used in the present work including the annotation procedure for segmentation of regions of interest in each MRI image. We further explain the training regime for all the models and elaborate on the proposed framework for computing interpretable features using adaptive path-based gradient integration techniques for scoring pixel-wise feature relevance as discussed in Section 3.2. Results show that deep neural network models trained on medical images can give prediction confidence through softmax scores as well as use interpretability techniques to infer feature attribution maps.

4.1 Dataset

Data for this study is from Cheng (2017). It comprises 2D slices of brain contrast-enhanced MRI (CE-MRI) T1-weighted images consisting of 3064 slices from 233 patients. It includes 708 Meningiomas, 1426 Gliomas, and 930 Pituitary tumors. Representative MRI image slices that have large lesion sizes are selected to construct the dataset. In each slice, the tumor boundary is manually delineated and verified by radiologists. We have plotted 16 random samples from the three classes with tumor borders depicted in red as shown in Figure 2.

Figure 2: Shows randomly sampled images from the brain tumor dataset. The red annotated regions indicate perimeters of segmented tumor borders. From the figure, Glioma samples have the widest tumor areas as opposed to the other two tumor classes. Glioma tumor tissue can be formed in varied locations in the brain. Like Glioma, a Meningioma is a primary central nervous system (CNS) tumor and can begin in the brain or spinal cord areas. Meningioma is the most common type of tumor among patients. As shown in the figure, samples often occur in pairs across opposite regions of the brain. As depicted in the figure, Pituitary tumors are abnormal growths that develop in the pituitary gland that lead to excess hormonal releases that regulate important body functions.

These 2-D slices of T1-weighted images are used to train standard deep CNNs for a 3-class classification task into Glioma, Meningioma, and Pituitary tumors.

4.2 Implementation Performance

Figure 3:

(Left-panel) Performance measure of the 8 CNN architectures used in this experiment all trained for 20 epochs. Overall, DenseNet-121

Chollet (2017) showed the highest

Score reaching of .981. (Right-panel) The confusion matrix for test samples which represent 20% of the dataset. The model was able to generalize well with 5, 4, and 3 misclassification for Meningioma, Glioma, and Pituitary tumor respectively. Because of the distinctness of both Meningioma and Pituitary tumor, the model has 0 false positives between both classes.

As the primary objective of this study is to build a framework for understanding the visual interpretability of deep models in medical images, we limit our experiments to 8 standard vision-based deep neural architectures. We train and test the 8 standard CNN architectures; results are shown in Figure 3 and summarised in Table 2 with training parameters depicted in Table 3. The input to each model is a tensor that is a resized version of the original image slices primarily due to computational concerns. We conducted all experiments on an Nvidia K80/T4 GPU. In Section 4.3 several saliency methods are applied to understand model interpretability.

Parameters Depth Top 1 % Accuracy
VGG16 528 138.4M 16 0.928797
VGG19 549 143.7M 19 0.887658
ResNet50 98 25.6M 107 0.936709
ResNet50V2 98 25.6M 103 0.962025
InceptionV3 92 23.9M 189 0.944620
Xception 88 22.9M 81 0.966772
EfficientNetB0 29 5.3M 132 0.933544
DenseNet121 33 8.1M 242 0.981013
Table 2: A comparison of the 8 models on the test set including their characteristic properties. DenseNet-121 has the best overall performance on the unseen test set reaching a top 1 accuracy of 98.10 %. Relative to the least performing model, VGG-19, it is not only parameter efficient but has a small memory footprint of at least 16 times less than VGG-19. From this table, we chose the top three best-performing models for saliency analysis considering the impact of parameter count and depth on the type of representations learnable from these models.
Hyperparameter Setting
Learning rate 1e-3
Batch size 32
Number of epochs 20
Training set 0.7
Test set 0.3
Input shape
Momentum 9.39e-1
Decay 3e-4
Optimizer Stochastic Gradient Descent with Momentum (SDGM)
Table 3: Training hyperparameters

The DenseNet-121 model showed the best overall test performance reaching 98.10%. The test results indicate the high confidence and stability of model prediction. This is the basis of selection for further feature attribution given that it is the best performing model implying it has learned a more robust and generalizable representation of the data distribution as shown in Figure 4. The clear distinction between Figure 4 (top-panel) and Figure 4 (bottom-panel) gives an evident indication that the model has learnt inherent factors of variation in the signals which have been disentangled into nearly separable manifolds in the learnt representation space. This figure supports the results of the confusion matrix above as Glioma and Meningioma have small overlap in the embedding space thus the 9 misclassified samples between both classes. However, this ability of learning necessitates the notion of what has the model learnt about the data space and how can it be interpreted by domain experts? Thus, the notion of feature attribution is investigated to make sense of mapping between the model input and the predicted class.

Figure 4: (Top-panel) A t-SNE Van der Maaten and Hinton (2008) two-dimensional projection of the unrolled pixel space representation of MRI slices where the colors purple, green, and yellow represent the three classes of Meningioma, Glioma, and Pituitary tumor respectively. However, given that the data is generated under differing physical and statistical conditions, the classes are entangled and are (non-)linearly non-separable. This can impede learning using linear function approximators. (Bottom-panel) A t-SNE projection of the embedding representation from a trained DenseNet-121 network. The model has disentangled the underlying factors of variation in a latent representation space that allows separability using either linear or non-linear function approximators as shown by the nearly distinct manifolds of the three classes.

4.3 Attribution

Our proposed attribution approach is predicated on the notion that visual inspection has a major role in medical image analysis decision making. Naturally, an automated visual attribution method is vital in human-centred AI medical image analysis pipeline. Given that many attribution methods have been proposed, we have, however, used gradient-based adaptive path integration methods because of their robustness to noise and smoother pixel-level feature saliency mappings as depicted in the Figures 5, 6, and 7 for the three brain tumor classes. These methods are: Vanilla Gradient Sundararajan et al. (2017), Guided Integrated Gradient (GIG) Kapishnikov et al. (2021) and XRAI Kapishnikov et al. (2019). The visual attribution was implemented using the three best models as shown in Figure 3 (Left-panel).

Figure 5: Three randomly sampled test images from each tumor class are chosen for saliency analysis using the trained Xception model. In the first column on the left is the input image where the red borders depict the delineated boundaries of tumors. Each tumor image undergoes saliency analysis using each of the attribution methods as shown in the first row titles from Vanilla Gradient-based masking to Smooth Blur Guided Integrated Gradients. The images are plotted on gray-scale, and the bright spots show the regions in the input selected for classification into the predicted class by the model. Overall, XRAI has the best explainability of the input signals. This is further explored by selecting 30% of the attributed image as presented in the Fast XRAI 30%. There is an emergence of salient features that overlap with input region of interest for each tumor class.

Xception shows the least visual explainablity as indicated in Figure 5. From the input image, Pituitary tumor located in the pituitary gland, a region below the hypothalamus is faintly attributed by all but XRAI. The attribution masks give little meaningful information about the region of interest where the tumor is present. Though other factors such as the dataset size, batch size, annotation quality, data augmentation technique can considerably lead to the emergence of such characteristics, the model architecture and optimization objective have a large effect as they introduce stronger inductive priors on the space of learning functions. Moreover, this result indicates the difference between statistical correlations learned by CNNs being different from the way humans perceive and process visual stimuli.

Figure 6: A plot of the same sampled images to perform saliency analysis using the trained ResNet50v2. In contrast to the Xception model, the saliency maps are very stable with reduced noise levels across all three classes. However, Vanilla Gradient has the most noisy saliency map. More importantly, XRAI has wider regions of interest selected for that correspond to the input signal as opposed to the Xception model. This phenomenon is supported by its overall test accuracy being higher than Xception, thus, has learnt a more robust representation of the data distribution.

Figure 7: Using the sampled data from the test set, feature attribution was done with the trained DenseNet-121 model. It is the overall best performing model in this study. This phenomenon is visible from the saliency maps this model has attributed to the inputs. Here, we observe that with a suitably trained model, Vanilla Gradient shows a degree regularity in the saliency maps where features in all three tumor images are highlighted for by the model especially in the Meningioma and Glioma images. As with the other models, XRAI has the best interpretability for the input phenomena.

From Figures 6 and 7, we observed that XRAI gives the best saliency maps as shown in the masked MRI images. VG and SG have coarse and partially noisy saliency maps, and can not be used to infer meaningful explanations of the model inference mechanism. As stated in the original papers, the baseline choice has a major effect on the obtainable saliency map Sundararajan et al. (2017); Kapishnikov et al. (2019, 2021). We used a baseline of zero pixels for all attribution methods primarily because it is information neutral. XRAI demonstrated higher interpretability compared to vanilla gradient and guided integrated gradients methods because it is more suited to deep learning-based medical image analysis tasks where the emphasis is to understand the region of interest from which a model inferred its prediction. We observed that a combination of XRAI and Blur IG can deduce feature saliency from the medical scans as 35% of saliency maps of XRAI highlights important features that are in a close approximation of expert segmentation for the DenseNet-121 model. So, utilizing multiple attribution methods can improve model interpretability for domain experts.

These results, therefore, open the possibility of not only accelerating visual interpretability of deep neural models in medical image analysis but as well offset prepossessing such as human-in-the-loop segmentation, model debugging, and debiasing which are all crucial in real-world application use cases. The latter has an important role in low-decision risk and highly regulated domains such as healthcare. In sum, these stated use cases can rapidly advance access to needed but affordable healthcare for low-resource settings.

However, Table 2 in tandem with 5, 6 and 7 show that the inductive architectural priors have to most impact on the selectivity of the receptive fields of CNNs for visual saliency analysis. CNNs perform spatial weight sharing where each filter is replicated across the entire visual field of the input Luo et al. (2016), thus, the resolution of this receptive field matters. Unlike humans, CNNs have texture and shape biases that are evident across all the model architectures Geirhos et al. (2018); Baker et al. (2018). Visual attribution methods must consider raise this notion in a human-in-the-loop AI systems to ameliorate the pitfalls of the wrong attribution in deep models for real-world healthcare applications.

5 Conclusion

Deep learning models are gaining traction in ubiquitous healthcare applications from the application of vision techniques to language models. However, the inference mechanisms of these models is still an open question. In this paper, we posed the question: What do these models learn in medical images? Our findings show that the robust statistical regularities learned between input-output mappings differ from biological visual stimuli processing done by humans. We show that different input attribution methods have varied degrees of explainability of the input signal. A robust representation learner and the right attribution approach are crucial to getting interpretable saliency maps of deep CNNs in medial image analysis. This is important because it will help in building human-in-the-loop computer-aided diagnostic models that not only generalize well to unseen samples but are also explainable to domain experts. Our findings indicate that deep models can complement the efforts of medical experts in efficiently detecting and diagnosing diseases from medical images. Thus, a human-in-the-loop approach can accelerate the adoption of neural models in medical decision-making. It provides a path toward building stakeholder trust given that healthcare requires critical evaluation of assistive technologies before adoption and general usage. Finally, we encourage further research into quantifying the explainability of these visual attribution methods, developing benchmarks against which new visual attribution methods can be measured to accelerate model explainability research, and the provision of open access tumor boundary segmented dataset so as to test new saliency algorithms in ground truth expert segmented datasets.


This research received no external funding.

Availability of data and materials

This research used the brain tumor dataset from the School of Biomedical Engineering Southern Medical University, Guangzhou, which contains 3064 T1-weighted contrast-inhanced images with three kinds of brain tumor. The data is publicly available at https://figshare.com/articles/dataset/brain_tumor_dataset/1512427. The code is available at https://github.com/yusufbrima/XDNNBioimaging for reproduciblity.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All the authors contributed to this work.


  • N. Baker, H. Lu, G. Erlikhman, and P. J. Kellman (2018) Deep convolutional networks do not classify based on global object shape. PLoS computational biology 14 (12), pp. e1006613. Cited by: §4.3.
  • C. Bass, M. da Silva, C. Sudre, P. Tudosiu, S. Smith, and E. Robinson (2020) Icam: interpretable classification via disentangled representations and feature attribution mapping. Advances in Neural Information Processing Systems 33, pp. 7697–7709. Cited by: §2.
  • D. Bau, B. Zhou, A. Khosla, A. Oliva, and A. Torralba (2017) Network dissection: quantifying interpretability of deep visual representations. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 6541–6549. Cited by: §2.
  • C. F. Baumgartner, L. M. Koch, K. C. Tezcan, J. X. Ang, and E. Konukoglu (2018) Visual feature attribution using wasserstein gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8309–8319. Cited by: §2.
  • Y. Bengio, P. Simard, and P. Frasconi (1994) Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5 (2), pp. 157–166. Cited by: §3.1.
  • M. Böhle, F. Eitel, M. Weygandt, and K. Ritter (2019) Layer-wise relevance propagation for explaining deep neural network decisions in mri-based alzheimer’s disease classification. Frontiers in aging neuroscience, pp. 194. Cited by: §2.
  • S. Camalan, H. Mahmood, H. Binol, A. L. D. Araújo, A. R. Santos-Silva, P. A. Vargas, M. A. Lopes, S. A. Khurram, and M. N. Gurcan (2021) Convolutional neural network-based clinical predictors of oral dysplasia: class activation map analysis of deep learning results. Cancers 13 (6), pp. 1291. Cited by: §2.
  • J. Cheng (2017) brain tumor dataset. External Links: Link, Document Cited by: §4.1.
  • F. Chollet (2017) Xception: deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258. Cited by: §1, §3.1, §3.1, Figure 3.
  • J. P. Cohen, R. Brooks, S. En, E. Zucker, A. Pareek, M. P. Lungren, and A. Chaudhari (2021)

    Gifsplanation via latent shift: a simple autoencoder approach to counterfactual generation for chest x-rays

    In Medical Imaging with Deep Learning, pp. 74–104. Cited by: §2.
  • R. O. Duda, P. E. Hart, et al. (1973) Pattern classification and scene analysis. Vol. 3, Wiley New York. Cited by: §3.2.
  • C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In

    International conference on machine learning

    pp. 1126–1135. Cited by: §3.1.
  • R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel (2018) ImageNet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231. Cited by: §4.3.
  • X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp. 249–256. Cited by: §3.1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §1, §3.1, §3.1.
  • S. Hochreiter (1991) Untersuchungen zu dynamischen neuronalen netzen. Diploma, Technische Universität München 91 (1). Cited by: §3.1.
  • G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §3.1, §3.1.
  • A. Kapishnikov, T. Bolukbasi, F. Viégas, and M. Terry (2019) Xrai: better attributions through regions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4948–4957. Cited by: §3.2, §4.3, §4.3.
  • A. Kapishnikov, S. Venugopalan, B. Avci, B. Wedin, M. Terry, and T. Bolukbasi (2021) Guided integrated gradients: an adaptive path method for removing noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5050–5058. Cited by: §2, §3.2, §4.3, §4.3.
  • D. S. Kermany, M. Goldbaum, W. Cai, C. C. Valentim, H. Liang, S. L. Baxter, A. McKeown, G. Yang, X. Wu, F. Yan, et al. (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172 (5), pp. 1122–1131. Cited by: §2.
  • E. Kim, S. Kim, M. Seo, and S. Yoon (2021) XProtoNet: diagnosis in chest radiography with global and local explanations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15719–15728. Cited by: §2.
  • P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang (2020) Concept bottleneck models. In International Conference on Machine Learning, pp. 5338–5348. Cited by: §2.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25. Cited by: §1.
  • D. Lenis, D. Major, M. Wimmer, A. Berg, G. Sluiter, and K. Bühler (2020) Domain aware medical image classifier interpretation by counterfactual impact analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 315–325. Cited by: §2.
  • O. Li, H. Liu, C. Chen, and C. Rudin (2018) Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §2.
  • G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. Medical image analysis 42, pp. 60–88. Cited by: §1.
  • W. Luo, Y. Li, R. Urtasun, and R. Zemel (2016) Understanding the effective receptive field in deep convolutional neural networks. Advances in neural information processing systems 29. Cited by: §4.3.
  • G. Murtaza, L. Shuib, A. W. Abdul Wahab, G. Mujtaba, H. F. Nweke, M. A. Al-garadi, F. Zulfiqar, G. Raza, and N. A. Azmi (2020) Deep learning-based breast cancer classification through medical imaging modalities: state of the art and research challenges. Artificial Intelligence Review 53 (3), pp. 1655–1720. Cited by: §1.
  • P. Natekar, A. Kori, and G. Krishnamurthi (2020) Demystifying brain tumor segmentation networks: interpretability and uncertainty analysis. Frontiers in computational neuroscience 14, pp. 6. Cited by: §2.
  • M. Reyes, R. Meier, S. Pereira, C. A. Silva, F. Dahlweid, H. v. Tengg-Kobligk, R. M. Summers, and R. Wiest (2020) On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiology: artificial intelligence 2 (3), pp. e190043. Cited by: §1.
  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams (1986) Learning representations by back-propagating errors. nature 323 (6088), pp. 533–536. Cited by: §1.
  • O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115 (3), pp. 211–252. Cited by: §3.1.
  • S. Sabour, N. Frosst, and G. E. Hinton (2017) Dynamic routing between capsules. Advances in neural information processing systems 30. Cited by: §2.
  • Z. Salahuddin, H. C. Woodruff, A. Chatterjee, and P. Lambin (2022) Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Computers in Biology and Medicine 140, pp. 105111. External Links: ISSN 0010-4825, Document, Link Cited by: §2.
  • K. Schutte, O. Moindrot, P. Hérent, J. Schiratti, and S. Jégou (2021) Using stylegan for visual interpretability of deep learning models on medical images. arXiv preprint arXiv:2101.07563. Cited by: §2.
  • J. C. Seah, J. S. Tang, A. Kitchen, F. Gaillard, and A. F. Dixon (2019) Chest radiographs in congestive heart failure: visualizing neural network learning. Radiology 290 (2), pp. 514–522. Cited by: §2.
  • R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618–626. Cited by: §2.
  • D. Shen, G. Wu, and H. Suk (2017) Deep learning in medical image analysis. Annual review of biomedical engineering 19, pp. 221–248. Cited by: §1.
  • S. Shen, S. X. Han, D. R. Aberle, A. A. Bui, and W. Hsu (2019) An interpretable deep hierarchical semantic convolutional neural network for lung nodule malignancy classification. Expert systems with applications 128, pp. 84–95. Cited by: §2.
  • K. Simonyan, A. Vedaldi, and A. Zisserman (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. Cited by: §2.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §1, §3.1.
  • S. Singla, B. Pollack, S. Wallace, and K. Batmanghelich (2021) Explaining the black-box smoothly-a counterfactual approach. arXiv preprint arXiv:2101.04230. Cited by: §2.
  • M. Sundararajan, A. Taly, and Q. Yan (2017) Axiomatic attribution for deep networks. In International conference on machine learning, pp. 3319–3328. Cited by: §3.2, §4.3, §4.3.
  • C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §3.1, §3.1.
  • M. Tan and Q. Le (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp. 6105–6114. Cited by: §3.1, §3.1.
  • L. Van der Maaten and G. Hinton (2008) Visualizing data using t-sne.. Journal of machine learning research 9 (11). Cited by: §3.2, Figure 4.
  • S. Wold, K. Esbensen, and P. Geladi (1987) Principal component analysis. Chemometrics and intelligent laboratory systems 2 (1-3), pp. 37–52. Cited by: §3.2.