Explainability of Deep Learning Models
The accurate automatic segmentation of gliomas and its intra-tumoral structures is important not only for treatment planning but also for follow-up evaluations. Several methods based on 2D and 3D Deep Neural Networks (DNN) have been developed to segment brain tumors and to classify different categories of tumors from different MRI modalities. However, these networks are often black-box models and do not provide any evidence regarding the process they take to perform this task. Increasing transparency and interpretability of such deep learning techniques are necessary for the complete integration of such methods into medical practice. In this paper, we explore various techniques to explain the functional organization of brain tumor segmentation models and to extract visualizations of internal concepts to understand how these networks achieve highly accurate tumor segmentations. We use the BraTS 2018 dataset to train three different networks with standard architectures and outline similarities and differences in the process that these networks take to segment brain tumors. We show that brain tumor segmentation networks learn certain human-understandable disentangled concepts on a filter level. We also show that they take a top-down or hierarchical approach to localizing the different parts of the tumor. We then extract visualizations of some internal feature maps and also provide a measure of uncertainty with regards to the outputs of the models to give additional qualitative evidence about the predictions of these networks. We believe that the emergence of such human-understandable organization and concepts might aid in the acceptance and integration of such methods in medical diagnosis.READ FULL TEXT VIEW PDF
Magnetic resonance imaging (MRI) is routinely used for brain tumor diagn...
The black-box nature of deep learning models prevents them from being
Recently deep learning has been playing a major role in the field of com...
The brain tumor segmentation on MRI images is a very difficult and impor...
Automation of brain tumor segmentation in 3D magnetic resonance images (...
Uncertainty estimates of modern neuronal networks provide additional
Explainability of Deep Learning Models
Deep learning algorithms have shown great practical success in various tasks involving image, text and speech data. As deep learning techniques start making autonomous decisions in areas like medicine and public policy, there is a need to explain the decisions of these models so that we can understand why a particular decision was made (molnar2018interpretable).
In the field of medical imaging and diagnosis, deep learning has achieved human-like results on many problems (kermany2018identifying), (esteva2017dermatologist), (weng2017can)
. Interpreting the decisions of such models in the medical domain is especially important, where transparency and a clearer understanding of Artificial Intelligence are essential from a regulatory point of view and to make sure that medical professionals can trust the predictions of such algorithms.
Understanding the organization and knowledge extraction process of deep learning models is thus important. Deep neural networks often work in higher dimensional abstract concepts. Reducing these to a domain that human experts can understand is necessary - if a model represents the underlying data distribution in a manner that human beings can comprehend and a logical hierarchy of steps is observed, this would provide some backing for its predictions and would aid in its acceptance by medical professionals.
However, while there has been a wide range of research on Explainable AI in general (doshi2017towards), (gilpin2018explaining), it has not been properly explored in the context of deep learning for medical imaging. (holzinger2017we) discuss the importance of interpretability in the medical domain and provide an overview of some of the techniques that could be used for explaining models which use the image, omics, and text data.
In this work, we attempt to extract explanations for models which accurately segment brain tumors, so that some evidence can be provided regarding the process they take and how they organize themselves internally. We first discuss what interpretability means concerning brain tumor models. We then present the results of our experiments and discuss what these could imply for machine learning assisted tumor diagnosis.
Interpreting deep networks which accurately segment brain tumors is important from the perspectives of both transparency and functional understanding. Providing glimpses into the internals of such a network to provide a trace of its inference steps (holzinger2017we) would go at least some way to elucidating exactly how the network makes its decisions, providing a measure of legitimacy.
There have been several methods explored for trying to look inside a deep neural network. Many of these focus on visual interpretability, i.e. trying to extract understandable visualizations from the inner layers of the network or understanding what the network looks at when giving a particular output (zhang2018visual).
For a brain tumor segmentation model, such methods might provide details on how information flows through the model and how the model is organized. For example, it might help in understanding how the model represents information regarding the brain and tumor regions internally, and how these representations change over layers. Meaningful visualizations of the internals of a network will not only help medical professionals in assessing the legitimacy of the predictions but also help deep learning researchers to debug and improve performance.
In this paper, we aim to apply visual interpretability and uncertainty estimation techniques on a set of models with different architectures to provide human-understandable visual interpretations of some of the concepts learned by different parts of a network and to understand more about the organization of these different networks. We organize our paper into mainly three parts as described in Figure1: (1) Understanding information organization in the model, (2) Extracting visual representations of internal concepts, and (3) Quantifying uncertainty in the outputs of the model. We implement our pipeline on three different 2D brain tumor segmentation models - a Unet model with a densenet121 encoder (Henceforth referred to as the DenseUnet) (shaikh2017brain), a Unet model with a ResNet encoder (ResUnet) (kermi2018deep)
, and a simple encoder-decoder network which has a similar architecture to the ResUnet but without skip or residual connections (SimUnet). All models were trained on the BraTS 2018 dataset (menze2014multimodal, bakas2018identifying, bakas2017advancing) till convergence. A held out validation set of 48 volumes (including both LGG and HGG volumes) was used for testing. Table 1 shows the performance of the three models on this test set.
|Model Type||WT Dice||TC Dice||ET Dice|
Our models are not meant to achieve state of the art performance. Instead, we aim to demonstrate our methods on a set of models with different structures commonly used for brain tumor segmentation and compare them to better understand the process they take to segment the tumors. In the following sections, each element of the proposed pipeline is implemented and its results and implications are discussed.
Deep neural networks may be learning explicit disentangled concepts from the underlying data distribution. For example, (zhou2014object)
show that object detectors emerge in networks trained for scene classification. To study whether filters in brain tumor segmentation networks learn such disentangled concepts, and to quantify such functional disentanglement over different layers, we implement the Network Dissection(bau2017network) pipeline, allowing us to determine the function of individual filters in the network.
In-Network Dissection, the activation map of an internal filter for every input image is obtained. Then the distribution of the activation is formulated over the entire dataset. The obtained activation map is then resized to the dimensions of the original image and thresholded to get a concept mask. This concept mask might tell us which individual concept a particular filter learns when overlaid over the input image.
For example, in the context of brain-tumor segmentation, if the model is learning disentangled concepts, there might be separate filters learning to detect, say, the edema region, or the necrotic tumor region. The other possibility is that the network somehow spreads information in a form not understandable by humans - entangled and non-interpretable concepts.
Mathematically, Network Dissection is implemented by obtaining activation maps of a filter in layer , and then obtaining the pixel level distribution of over the entire dataset.
is determined as the 0.01-quantile level of, which means only 1.0% of values in are greater than . The concept mask is obtained as
A channel is a detector for a particular concept if
In this study, we only quantify explicit concepts like the core and necrotic tumor due to the availability of ground truths and recognize detectors for other concepts by visual inspection. We post-process the obtained concept images to remove salt-and-pepper noise and keep only the largest activated continuous concept inside the brain region in the image. The IoU between the final concept image and the ground truth for explicit concepts is used to determine the quality of the concept.
The results of this experiment, shown in Figures 2, 4, and 3, indicate that individual filters of brain-tumor segmentation networks learn explicit as well as implicit disentangled concepts. For example, Figure 2(a) shows a filter learning the concept whole tumor region i.e. it specifically detects the whole tumor region for any image in the input distribution, the filter in 2c seems to be learning the edema region, while 2f shows a filter learning the white and grey matter region, an implicit concept which the network is not trained to learn. Similar behaviour is seen in all networks (Figures 2, 4, 3).This means that we can make functional attributions to the network at a filter level - indicating a sort of functional specificity in the network i.e. individual filters might be specialized to learn separate concepts.
Neural Networks are inspired by neuroscientific principles. What does this functional specificity mean in this context? Debates are ongoing on whether specific visual and cognitive functions in the brain are segregated and the degree to which they are independent. (american1998autonomy) discuss the presence of spatially distributed, parallel processing systems in the brain, each with its separate function. Neuroscientific studies have shown that the human brain has some regions that respond specifically to certain concepts, like the face fusiform area (kanwisher2006fusiform) - indicating certain visual modularity. Studies based on transcranial magnetic stimulation of the brain also show separate areas of the visual cortex play a role in detecting concepts like faces, bodies, and objects (pitcher2009triple).
The emergence of concept detectors in our study indicates that brain-tumor segmentation networks might show a similar modularity. This indicates that there is some organization in the model similar to the process a human being might take to recognize a tumor, which might have an implications with regards to the credibility of these models in the medical domain, in the sense that they might be taking human-like, or at least human understandable, steps for inference.
Understanding how spatial attention of a network over an input image develops might provide clues about the overall strategy the network uses to localize and segment an object. Gradient weighted Class Activation Maps (Grad-CAM) (selvaraju2017grad) is one efficient technique that allows us to see the network’s attention over the input image. Grad-CAM provides the region of interest on an input image which has a maximum impact on predicting a specific class.
Segmentation is already a localization problem. However, our aim here is to see how attention changes over internal layers of the network, to determine how spatial information flows in the model. To understand the attentions of each layer on an input image, we convert segmentation to a multi-label classification problem by considering class wise global average pooling on the final layer. The gradient of the final global average pooled value is considered for attention estimation in Grad-CAM. To understand the layer-wise feature map importance, Grad-CAM was applied to see the attention of every internal layer.
This mathematically amounts to finding neuron importance weightsfor each filter of a particular layer with respect to the global average pooled output segmentation for a particular channel :
Where, and are the number of pixels in the output segmentation map and the activation map of the relevant layer for channel respectively, is the output segmentation map for class of network , describes spatially pooled final segmentation map, is the activation map for the filter of the layer, and represents an output map which is the result of for channel .
We posit that model complexity and residual connections might have an impact on how early a model can localize the tumor region. For example, the DenseUnet and ResUnet localize the tumor region in the first few layers, while the SimUnet localizes the tumor region only in the final few layers (Figure 5). This indicates that skip and residual connections help learn and propagate spatial information to the initial layers for faster localization. While previous literature indicates that skip connections allow upsampling layers to retain fine-grained information from downsampling layers (jegou2017one), (drozdzal2016importance), our results indicate that information might also be flowing in the other direction i.e. skip and residual connections help layers in the downsampling path to learn spatial information earlier.
also discuss that layers closer to the center of the model might be more difficult to train due to the vanishing gradient problem and that short skip or residual connections might alleviate this problem. Our results support this as well - middle layers of the SimUnet, which does not have residual or skip connections, seem to learn almost no spatial information compared to the other two networks (Figure5a).
Our results in Figure 5 also show that models take a largely top-down approach to localizing tumors - they first pay attention to the entire brain, then the general tumor region, and finally converge on the actual finer segmentation. For example, attention in all three models is initially in the background region. In the Udensenet and Uresnet, attention quickly moves to the brain and whole tumor within the first few layers. Finer segmentations are done in the final few layers. The necrotic tumor and enhancing tumor are often separated only in the last few layers for all models, indicating that segregating these two regions might require a lesser number of parameters.
This top-down nature is consistent with theories on visual perception in humans - the global-to-local nature of visual perception has been documented. (navon1977forest) showed through experiments that larger features take precedence over smaller features, called the Global Precedence Effect. While this effect has its caveats (beaucousin2013global), it is generally robust (kimchi2015perception). Brain tumor segmentation models seem to take a similar top-down approach, and we see in our experiments that such behavior becomes more explicit as model performance improves.
While the results from the last two sections are not unexpected, they are not trivial either - the models do not need to learn disentangled concepts, especially implicit ones like the whole brain, the white matter region and the whole tumor for which no explicit labels have been given, nor do they need to take a hierarchical approach to this problem. The fact that such human-understandable traces of inference can be extracted from brain tumor segmentation models is promising in terms of their acceptance in the medical domain.
Visualizing the internal features of a network often provides clues as to the network’s understanding of a particular output class. For example, visualizing features of networks trained on the ImageNet [(imagenet_cvpr09)] dataset shows filters maximally activated by textures, shapes, and objects (olah2018the). However, this technique has rarely been applied to segmentation models, especially in the medical domain. Extracting such internal features of a brain-tumor segmentation model might provide more information about the qualitative concepts that the network learns and how these concepts develop over layers.
We use the Activation Maximization (erhan2009visualizing) technique to iteratively find input images that highly activate a particular filter. These images are assumed to be a good first-order representations of the filters. Mathematically, activation maximization can be seen as an optimization problem:
Where, is the optimized pre-image, is the activation of the filter of the layer, and are the set of regularizers.
In the case of brain-tumor segmentation, the optimized image is a 4 channel tensor. However, activation maximization often gives images with extreme pixel values or random repeating patterns that highly activate the filter but are not visually meaningful. In order to prevent this, we regularize our optimization to encourage visually meaningful images.
A number of regularizers have been proposed in the literature to improve the outputs of activation maximization. We use three regularization techniques to give robust human-understandable feature visualizations, apart from an L2 bound which is included in equation 6:
In order to increase translational robustness of our visualizations, we implement Jitter ((inceptionism)
). Mathematically, this involves padding the input image and optimizing a different image-sized window on each iteration. In practice, we also rotate the image slightly on each iteration. We find that this greatly helps in reducing high-frequency noise and helps in crisper visualizations.
Total Variation (TV) regularization penalizes variation between adjacent pixels in an image while still maintaining the sharpness of edges ((strong2003edge)). We implement this regularizer to smooth our optimized images while still maintaining the edges. The TV regularizer of an image I with (w, h, c) dimension is mathematically given as in equation 7:
In order to obtain visualizations which are similar in style to the set of possible input images, we implement a style regularizer inspired from the work of (li2017demystifying). We encourage our optimization to move closer to the style of the original distribution by adding a similarity loss with a template image, which is just an image randomly chosen from the input training data. In style transfer, the gram matrix is usually used for this purpose. However, we implement a loss which minimizes the distance between the optimized and template image in a higher dimensional kernel space, as implemented in (li2017demystifying). This is computationally less intensive and allows us to apply the loss channel-wise so that each channel is encouraged to be similar to the style of its corresponding modality.
Mathematically, equation 6 is modified to the following:
Where it the style loss between the optimized pre-image and the template image, is the Gaussian kernel, is the filter for which activations need to be maximized, is the Total Variation Loss, and is an upper bound on the optimized pre-image .
We find that style constraining the images and making them more robust to transformations does help in extracting better feature visualizations qualitatively - optimized pre-images do show certain texture patterns and shapes. Figure 6 shows the results of such an experiment. For better interpretations, we show visualizations of filters which learn disentangled concepts from Section 4.1. The effect of regularizers is clear - not regularizing the image leads to random, repeating patterns with high-frequency noise. Constrained optimization gives visualizations closer to the concepts learnt by the layer. It is still not clear that these are faithful reflections of what the filter is actually detecting - only that they are closer to human understandings of the disentangled concepts that the filter appears to learn.
We observe that while it is difficult to extract diagnostic meaning from the results of feature visualization, textures and patterns are visible on constraining the optimization to a more probable domain. However, collaboration with radiologists and medical professionals in this context is required and could provide a complete understanding of what a brain tumor segmentation model actually detects qualitatively.
Augmenting model predictions with uncertainty estimates are essential in the medical domain since unclear diagnostic cases are aplenty. In such a case, a machine learning model must provide medical professionals with information regarding what it is not sure about, so that more careful attention can be given here. (begoli2019need) discuss the need for uncertainty in machine-assisted medical decision making and the challenges that we might face in this context.
Uncertainty Quantification for deep learning methods in the medical domain has been explored before. (leibig2017leveraging) show that uncertainties estimated using Bayesian dropout were more effective and more efficient for deep learning-based disease detection. (yang2017quicksilver) use a Bayesian approach to quantify uncertainties in a deep learning-based image registration task.
However, multiple kinds of uncertainties might exist in deep learning approaches - from data collection to model choice to parameter uncertainty, and not all of them are as useful or can be quantified as easily, as discussed below.
Epistemic uncertainty captures uncertainty in the model parameters, that is, the uncertainty which results from us not being able to identify which kind of model generated the given data distribution. Aleatoric uncertainty, on the other hand, captures noise inherent in the data generating process ((kendall2017uncertainties)). However, Aleatoric Uncertainty is not really useful in the context of this work - we are trying to explain and augment the decisions of the model itself, not the uncertainty in the distribution on which it is fit.
Epistemic uncertainty can, in theory, be determined using Bayesian Neural Networks. However, a more practical and computationally simple approach is to approximate this Bayesian inference by using dropout at test time. We use test time dropout (TTD) as introduced in(gal2016dropout) as an approximate variational inference. Then,
Where is the output of the neural network with weights on applying dropout on the
iteration. The models are retrained with a dropout rate of 0.2 after each layer. At test time, a posterior distribution is generated by running the model for 100 epochs for each image. We take the mean of the posterior sampled distribution as our prediction and the channel mean of the variance from Equation9 as the uncertainty (kendall2015bayesian). The results of this are shown in Figure 12.
We find that regions which are misclassified are often associated with high uncertainty. For example, Figure 12a shows a region in the upper part of the tumor which is misclassified as necrotic tumor, but the model is also highly uncertain about this region. Similar behaviour is seen in Figure 12b. In some cases, the model misses the tumor region completely, but the uncertainty map still shows that the model has low confidence in this region (12d), while in some cases, boundary regions are misclassified with high uncertainty (12c). In a medical context, these are regions that radiologists should pay more attention to. This would encourage a sort of collaborative effort - tumors are initially segmented by deep learning models and the results are then fine-tuned by human experts who concentrate only on the low-confidence regions, Figure 1 shows.
More sample images as well as uncertainty for other networks can be found in the Supplementary Material.
In this paper, we attempt to elucidate the process that neural networks take to segment brain tumors. We implement techniques for visual interpretability and concept extraction to make the functional organization of the model clearer and to extract human-understandable traces of inference.
From our introductory study, we make the following inferences:
Disentangled, human-understandable concepts are learnt by filters of brain tumor segmentation models, across architectures.
Models take a largely hierarchical approach to tumor localization. In fact, the model with the best test performance shows a clear convergence from larger structures to smaller structures.
Skip and residual connections may play a role in transferring spatial information to shallower layers.
Constrained optimization helps to extract feature visualizations closer to human-defined concepts of the brain and tumors. Correlating these with the disentangled concepts extracted from Network Dissection experiments might help us understand how exactly a model detects and generalizes such concepts on a filter level.
Misclassified tumor regions are often associated with high uncertainty, which indicates that an efficient pipeline which combines deep networks and fine-tuning by medical experts can be used to get accurate segmentations.
As we have discussed in the respective sections, each of these inferences might have an impact on our understanding of deep learning models in the context of brain tumor segmentation.
While more experiments on a broader range of models and architectures would be needed to determine if such behavior is consistently seen, the emergence of such human-understandable concepts and processes might aid in the integration of such methods in medical diagnosis - a model which seems to take human-like steps is easier to trust than one that takes completely abstract and incoherent ones. This is also encouraging from a neuroscience perspective - if model behaviour is consistent with visual neuroscience research on how the human brain processes information, as some of our results indicate, this could have implications in both machine learning and neuroscience.
Future work will be centered around gaining a better understanding of the segmentation process for a greater range of models (including 3D models) and better constrained optimization techniques for extracting human-understandable feature visualizations which would allow an explicit understanding of how models learn generalized concepts. For instance, it would be worth-wile to understand what set of regularizers generates the most medically relevant images. Textural information extracted from the optimized pre-images can also be analyzed to determine their correlation with histopathological features.
Further exploration regarding how these results are relevant from a neuroscience perspective can also be done, which might aid in understanding not just the machine learning model, but also how the brain processes information. The inferences from our explainability pipeline can also be used to integrate medical professionals into the learning process by providing them with information about the internals of the model in a form that they can understand.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
PN did the initial ideation. PN and AK developed the pipeline and performed the analysis and implementation. PN wrote the first draft, PN and AK revised the manuscript and generated the visualizations. GK edited the manuscript, supervised and funded the study.
Publicly available data sets were used for this study. The data sets can be found at the BRATS 2018 challenge (https://www.med.upenn.edu/sbia/brats2018/data.html).
Additional images for each section are presented below.
Final extracted disentangled concepts for different filters of a particular layer are shown. The figures clearly show that different filters are specialized to detect different concepts of the input image. All three networks show similar behaviour.
The figure below shows visualized features for a randomly selected filter of successive layers.