While deep learning using artificial neural networks has shown great performance in computer vision and other areas, we are still seeking and developing ways to better explain why and how they work. Mechanisms underlying neural networks are often explored by saliency methods[erhan2009visualizing, baehrens2010explain, Simonyan2014, Kim2019]. Generally, saliency methods calculate relevance or importance scores for input features (e.g., pixels in images) aimed at interpreting deep neural networks. Two-dimensional visualizations of importance scores are often referred to as saliency maps, which help to highlight input features relevant for classification or other tasks.
Saliency methods start from the gradient of a class score with respect to the input features , such as pixels for images
. The class scores are usually taken to be activation of neurons in prediction vectors encoding the class of interest. The gradient tells us which input features have to be changed minimally to have a large influence on the class score. Therefore, larger magnitudes of gradients suggest greater relevance.
Similarly, saliency methods can be applied to deep generative models such as Variational Autoencoder (VAE) [Kingma2014]. To achieve applicability to VAE and other generative models, which naturally lack known classes, concept vectors are used to compute concept scores , which can be understood as a replacement for [Brocki_2019]. A concept vector is a latent representation of a high-level concept, which could be known attributes [Larsen2016, White2016], cluster memberships [Huang2016], or others. The concept score is then obtained by measuring the similarity of the latent representation of an input image and the concept vector corresponding to that attribute .
Calculation of gradients is central to saliency methods, and thus have been modified to produce visually sharper or de-noised visualization. Based on calculating gradients in successive layers of neural network, Guided Backpropagation thresholds any negative values of gradients in each layer during backpropagation[Springenberg2015]. Rectified Gradients generalize this thresholding and is shown to result in sharper saliency maps in some cases [Kim2019].
However, we notice a brightness bias in Rectified Gradients [Kim2019]. Dark spots in an image were not highlighted by Rectified Gradients, irrespective of the relevance for a chosen class or concept. We demonstrate that this problem is intrinsic to the definition of Rectified Gradient [Kim2019], which we improve via a simple modification. Comparisons are given using several neural networks, including both synthetic examples and real images. This straightforward study showcases how our visual inspection may lead underlying explainability of deep learning models astray.
Ii Bias in Rectified Gradient
Gradients are generally noisy, when applied on image classification or other computer vision tasks involving deep neural networks [erhan2009visualizing, baehrens2010explain, Simonyan2014, Kim2019]. Unlike how humans can easily identify an object in an image, it is not necessary for or constrained in neural networks to mimic this behavior. Rectified Gradient [Kim2019] is a modified approach to the conventional calculation of gradients in order to reduce noise in saliency maps. Essentially, Rectified Gradients introduces layer-wise thresholding, such that artificially selected small values are removed during backpropagation [Kim2019].
We have observed that Rectified Gradient introduces a brightness bias in the saliency maps. Due to this bias, dark areas of an image are not highlighted even if they are highly relevant for the classification. We have identified the source of this bias to be a final multiplication of the saliency map with the input features which is critical part of the definition of Rectified Gradient. The saliency methods are obtained by Rectified Gradient as follows
where is the step function, is the activation in layer , is the gradient backpropagated up to and
is the input layer. The equation for the backpropagation of the gradients through the ReLU (1
) is a modification of the rule obtained directly from the chain rule
which is the definition used for the original saliency map [erhan2009visualizing]. These modifications by [Kim2019] were presented as to make the saliency maps less noisy. However, this has the undesirable effect of systematically neglecting dark features even when they are relevant. We therefore propose to simply remove the final multiplication with input features from Rectified Gradient, which will maintain its desirable feature of denoising the saliency maps while making it more generally applicable. This simplifies their method to .
The consequence of multiplying the final gradients with input features is evident, since RGB (red, green, blue) values in images do not intrinsically contain meaningful information about classes or concepts. We demonstrate the resulting bias in three examples.
First, a simple synthetic example is generated by placing black squares on images from the Imagenette dataset (https://github.com/fastai/imagenette), a subset of ImageNet [Russakovsky2015]. Out of 4000, 2000 images (50%) were randomly chosen and black boxes were placed (Figure 1
(a)). As a result, the background images from the Imagenette dataset are not correlated to existence of black boxes. A binary classifier using three convolutional layers is built and trained on them. Training for 10 iterations achieved 99.8% accuracy in the testing set, almost perfectly classifying the black box. Knowing that black boxes are artificially inserted into a random half of all images, only the black square in an image is relevant to this classifier. Thus, we expect the black square to be highlighted by having high values in accurate saliency maps. However, we noticed that Rectified Gradients of those black squares are zero, (Figure1(c)). Our proposed method successfully removes this bias.
Second, we investigate this bias when applying on variational autoencoders (VAE) on a large-scale face database with attributes called the CelebA dataset [liu2015faceattributes]. All details about the architecture of the VAE and the training can be found in [Brocki_2019]. Particularly, we focused on the concept “eyeglasses” which includes dark glasses and sunglasses. Figure 2(a) shows that our proposed method leads to a better highlighting (e.g., larger importance scores) of the dark glasses. Rectified Gradients (“RectGrad“ in Figure 2(a)) are suppressing any dark pixels and ended up with if an input pixel is black . Instead of highlighting the sunglasses, the saliency maps obtained using Rectified Gradient highlight the areas surrounding them. To show how gradients are modified, the scatterplot of the input features (averages of three color channels) and saliency maps (RectGrad or Proposed) are shown for each image (Figure 2(b)). Clearly, compared to our “No Bias” version, Rectified Gradients are showing artificially suppressed values when input pixel values are near 0.
Third, we use a deep residual net with 50 layers of convolutional neural networks called ResNet50[he2016deep] to classify examples from ImageNet [Russakovsky2015]. This example is used to demonstrate that preprocessing and normalization do not remove the observed bias. In some applications of deep neural networks, pixel values are scaled from to (or other ranges). This preprocessing only shifts this bias to an artificial point in the color spectrum. In particular, a range of which is typically used in ImageNet classifiers would introduce this bias at a middle grey. Those pixels around R=127.5, G=127.5, B=127.5 (which correspond to R=0, G=0, B=0 in the scaled range) would be strongly suppressed and will not appear to have highly relevant values in Rectified Gradients. This bias is particularly observed in ImageNet classes whose objects are naturally grey (Figure 3(a)). We see that grey great owls are better represented by the proposed method. The scatterplot of the scaled input features and saliency maps demonstrate this artificial bias around 0 in the y-axis (Figure 3(b)).
Iv Discussion and Conclusion
Interpretability of deep learning methods is highly sought after. While the saliency methods based on gradients are showing great potential to identify and visualize relevant input features, it is critical to carefully consider how our qualitative evaluation (e.g., visual inspection of de-noising) may differ from the underlying mechanisms of classification or conceptualization.
We have compared the saliency maps obtained with Rectified Gradients and with our proposed “No Bias” modification, which removes the final multiplication with input pixels, for three different case studies. In the first synthetic example, the difference between the two methods is particularly apparent because the black box is completely removed from the saliency map by Rectified Gradient, whereas our proposed method correctly highlights it. The second example used the concept score method to obtain saliency maps for a VAE with respect to the concept “eyeglasses”. The dark eyeglasses are substantially better highlighted when using our proposed method. In the third example, it is demonstrated that a preprocessing step of scaling or normalizing the pixel values does not remove this bias. These artificial biases are consistently apparent when plotting the input pixel values against the importance scores in saliency maps.
Generally, we are interested in better understanding and explaining the predictions or underlying mechanisms of deep neural networks, that are independent of the human biases [Adebayo2018]. One may argue that when the target object is bright, a multiplication with input pixels would lead to less noisy saliency maps. However, if we know the characteristics of the target object, such information should be incorporated into the model.
It can not be excluded that the saliency maps should be noisy, because this might accurately reflect the behavior of the model. In an extreme situation, the model may use a few pixels to reach its accurate prediction. Then, for many irrelevant pixels, the importance scores are highly noisy that will result in noisy saliency maps. If an artificially de-noised saliency map is desired, it is more transparent and interpretable to post-processing of the maps.
This work was supported by the Narodowe Centrum Nauki [2016/23/D/ST6/03613] and the NVIDIA Corporation’s GPU grant.