When applying AI models, especially in safety-critical areas, such as medical applications, autonomous driving, or criminal justice, we need to understand their underlying behavior to decide whether the models can be trusted or not. Here, the field of explainable AI (XAI) has established itself, where methods are being developed to illuminate the so-called black box models. XAI will serve as an essential support in ethical, legal, and social issues and ultimately also contribute to an increased acceptance by the end user.
Recent efforts in the field of XAI have revealed undesirable behavior of AI models . This especially includes, but is not limited to, a behavior called Clever Hans. A Clever Hans is a class specific artifact present in the training data and thus learned by the model as a relevant feature. Explanation methods, such as Layerwise Relevance Propagation (LRP)  can highlight the input features that led the model to a particular behavior. These methods can therefore reveal those Clever Hans artifacts, which can then be visually identified by the user. For example, the authors in  showed by the use of LRP that, for their specific dataset and model, none of the features of the horse were responsible for the prediction of the class “horse”. Rather, the features of the photographer’s watermark were shown as relevant. This can be seen as “cheating by the model” to achieve a lower loss and therefore a higher accuracy. Furthermore, problems such as adversarial or malicious attacks can arise, where an adversary applies perturbations to the data to force the model to deliver incorrect results. One of them is known as a backdoor attack , in which the black box models react extremely unexpectedly from an unaware user’s perspective but, as expected, maliciously from the attacker’s perspective in the presence of backdoor triggers. This emphasizes the importance of finding and suppressing these artifacts either from the model’s learnt representations or from the data itself. XAI can be one very effective method in discovering and suppressing these undesirable behaviors [28, 2].
While the treatment of artifacts has received limited attention, [28, 2] propose the use of spectral relevance analysis (SpRAy) on the LRP explanation maps of data to detect strategies and suppress artifacts learned by the models. The aforementioned LRP has received much attention and is a prominent example of the so called post-hoc XAI methods, which are methods developed separately from the black box models to explain their decisions.
However, recently, 
suggested to develop self-explaining neural networks. These intrinsically explain the decision making process to the user without a further need of post-hoc explanation methods. Towards this goal, the author behind proposed a network (ProtoPNet) that provides a transparent prediction by introducing a prototype layer between the final convolution layer and the output layer . This prototype layer consists of a fixed number of prototypes for each class, which can be thought of as representative instances for each class of the training data. During the classification process, for each image that is passed through the network, an activation map based on its similarity with respect to every prototype is computed which is then used for the final classification. Afterwards, by upsampling the activation map to the input size, the most relevant pixels, contributing to the classification, are highlighted. Doing this for both, the prototype (training) images and the test image, the regions of interests can be visualized which serve as a direct comparison for the user to capture the relation between the test image and the prototype images from the training set. This accordingly helps in comprehending the decision of the network by “this relevant feature of the test image looks like that relevant feature from the class-specific prototype image”.
Recalling the artifacts issue, the solution now appears to be clear when using self-explaining neural networks, such as ProtoPNet: If the model learned a feature corresponding to the artifact, then it must be reflected by at least one of the prototypes of the class consisting of such artifacts. After identifying these artifact prototypes, they can be pruned from the model thereby making the model artifact-free.
However, in this work we show that this is not a feasible approach because of coarse and spatially imprecise explanations provided by ProtoPNet due to its model-agnostic upsampling. Therefore, building on the principles of LRP, we propose a novel method referred to as Prototypical Relevance Propagation (PRP) to attain more accurate model-aware explanations. Using PRP, we illustrate that artifact information is entangled within the ProtoPNet, such that most prototypes capture artifact related features, making the above-mentioned pruning procedure unreasonable. Further, to preserve the strength of ProtoPNet and obtain “this-looks-like-that” explanations, while at the same time suppressing artifacts, we instead propose to filter out artifact-containing data via a clustering procedure that exploits the explanation information from multiple views (one view per class prototype). We show that utilising multiple views through multi-view clustering is more efficient than the single-view LRP-based approach, SpRAy .
While the effectiveness of post-hoc explainability methods has been investigated extensively [45, 26] and their benefit has been questioned , there is a significant gap in the research for the analyses of the effectiveness of self-explainable approaches regarding quantitative analysis of the provided explanations . As a representative for the self-explaining models, we focus on ProtoPNet as it provides easily comprehensible case-based reasoning and is applicable to arbitrary CNN architectures by inserting a single prototype layer. Additionally, it not only provides information about the features that the model’s decision is based on, but also links this information to similar features in the training data, captured by the prototypes, thus imitating human decision making 
. Further, inspired by LRP, we backpropagate the relevances of the prototypes, thereby obtaining model-aware prototypical explanations. This in turn incorporates the advantages of LRP of being computationally efficient with reasonable performance, along with the capability of reducing the gradient shattering effect.
Our main contributions are as follows:
We identify and address key issues with imprecise explanations provided by the self-explaining model, ProtoPNet.
We propose a novel PRP method for enhancing ProtoPNet’s explanations by generating more efficient model-aware explanations.
We compare PRP with ProtoPNet’s explanation heatmaps, both qualitatively and quantitatively and show that eradicating learned artifact features, such as the Clever Hans and Backdoor artifacts, from ProtoPNet is unfeasible.
We show the efficiency of PRP in utilizing multiple explanations from different prototypes to suppress artifacts from the data.
Ii Related work
Ii-a Explainability methods
Recently, there has been increased interest in both post-hoc explanation methods and self-explaining neural networks. Post-hoc explainability methods can be separated into two overarching categories: model-agnostic and model-aware approaches . Model-agnostic approaches, such as LIME  and SHAP , consider the models as black-boxes and are thus applicable to arbitrary model architectures. These can be used to compare models based on the explanations that they produce . Model-aware approaches [54, 42, 52, 25, 7, 37, 40, 22, 23, 44, 9, 33], on the other hand, take the internal structure of the model into account and therefore tend to yield more precise model based explanations. For example, LRP 
, a model-aware post-hoc XAI approach, has been widely used to explain the decisions of various deep neural networks, such as convolutional neural networks, recurrent neural networks and graph neural networks. Because of its importance in general, and to this paper in particular, we briefly describe the basic idea of LRP. LRP assigns relevance to the input by backpropagating the output relevance successively layer by layer until it is distributed over the input features. The distribution of relevance is based on how much a particular node contributed to the output. Suppose, node at layer connects to node at layer and the relevance at layer for node is . The relevance is then backpropagated to node , according to the LRP rule: , where is the contribution of the output from node to and is the total output at node .
Another, less explored, category of explanation methods are self-explaining networks, which inherently explain the decisions they make, thereby making the models transparent by design. These include networks that align the latent space to known visual concepts in order to increase transparency in the decisions [12, 30, 31, 51]. These also include models that utilize attention mechanisms [34, 5, 43, 15] and thus also provide some form of self-explainability. Furthermore, other works consider self-explainability in terms of concept learning [8, 56, 57, 13]. ProtoPNet  proposes to learn a specific number of class based prototypes as a part of the architecture. These are then used for visualizing lower spatial dimensional concepts from the training images, thus providing explanations during the decision process itself. SENN 
is a type of general self-explaining model that is fully transparent and designed by progressively generalizing linear classifiers to complex models. It consists of a concept encoder to get self-explainable features, an input-dependent parametrizer for generating relevance scores, and an aggregation function to get the final predictions. Although the self-explainable concepts in SENN are represented using prototypes similar to ProtoPNet, the former only shows which training images are important for a decision. ProtoPNet, on the other hand, additionally shows what part of the test image looks like which part of the training images, thus providing more comprehensible explanations. The Classification-By-Components (CBC) network
is designed based on Biederman’s theory in psychology, which assigns positive, negative, and indefinite reasoning to different components for classification. The framework consists of creating probabilistic trees for classes by finding and modelling class decomposition of patches followed by computing the class hypothesis probability based on reasoning over the components. Unlike CBC, ProtoPNet is more flexible in terms oflearning components (prototypes) of varying sizes in the input domain, and having the capability of being incorporated into any network architecture.
Inspired by ProtoPNet, XProtoNet  was recently introduced for automated diagnosis in chest radiography. It addresses the issue that ProtoPNet looks at fixed patch sizes in the features map while computing its similarity with the prototypes. As a remedy,  adds an occurrence module in the network for learning features of dynamic size for the prototypes. However, the issues that we address in this work do remain in XProtoNet, making it prone to misleading explanations due to the model-agnostic upsampling used for prototype visualizations.
Real-world data used for training deep neural networks are prone to contain spurious, incomplete, or wrongly labeled samples. This, and the fact that almost all neural networks that find applications in practical scenarios are black boxes, has led to a need to acknowledge and discuss the possible problems and solutions for all kinds of artifacts present in the data and learned by the black box models [28, 11, 18]. More importantly, these black box models are rapidly transitioning into delicate areas where these issues might lead to large and potentially fatal consequences .
In this section, we acknowledge this inherent problem and give a brief introduction of two of the most common artifacts, Clever Hans and Backdoor artifacts, whose suppression is the focus of this work. Clever Hans refers to unintentional artifacts in the data that are learned by the model to cheat and achieve better accuracy, while Backdoors are artifacts maliciously added to the data by an adversary to provoke erroneous model predictions.
Ii-B1 Clever Hans
Clever Hans artifacts refer to the spurious correlations present in the training data, which a model might use to base or strengthen their decisions on. This is termed as the “Clever-Hans” effect, coined after the “intellectual” horse Hans, thus claiming that the network doesn’t learn any meaningful features and successively is likely to fail in a real-world scenario, where the artifact is absent. An example of this can be seen in Figure 1
, where the network learns the watermark present in the horse images of the PASCAL VOC dataset rather than actually learning horse related features. Another example of Clever Hans artifacts has been observed in, where activation heatmaps uncovered spurious strategies used by a radiological model. This undesirable setting has also been explored recently by  and 
, in which they propose a semi-automated method, SpRAy, to discover hidden decisions learned by the network, followed by a cleansing of the data to remove such spurious correlations. SpRAy works by computing LRP maps of a given data set, downsizing them and then using spectral cluster analysis to find relevant structures through an eigengap analysis.
Ii-B2 Backdoor attacks
While the Clever Hans effect is learned by the network based on unintentional availability of unproductive information in the training data, in other scenarios, the network might be forced to learn undesirable features based on the malicious addition of hidden associations in the data with the goal to produce incorrect inference results. These types of attacks can lead to disastrous consequences, especially in cases of safety-critical applications. For example, in case of self-driving cars, a post-it note can be attached to stop signs and labeled as speed signs in the training dataset (Figure 5). The car is then very likely to behave erroneously in a real-world scenario where it may consider a stop sign with a real post-it as a speed sign with potentially disastrous consequences. These kinds of attacks are addressed in detail in  and their different case scenarios have been tackled in an increasing number of recent literature [17, 11]. Note that here, unlike in the Clever Hans scenario, both the data and the label are intentionally modified.
Ii-C Multi-view Clustering
In the self-explaining model, ProtoPNet, each class is associated with a fixed number of class prototypes. These can be regarded as capturing, and thus searching for, different features in each input image. Consequently, if there are artifacts present in a class during training, the PRP explanation maps for this class’ prototypes will be able to reflect the contrast between artifact and non-artifact features learnt by the model. Therefore, by considering our PRP explanation maps for the individual class prototypes as multiple views of a single image , we show that it is possible to efficiently suppress the artifacts from the data using multi-view clustering. Traditional multi-view clustering methods include learning a common representation from multiple views of data followed by clustering  or learning adaptive representations based on clustering [14, 58]. Alternative methodologies represent data with combined affinity matrices and therefore subsequently learning the cluster assignments from them [47, 55]. Further, several multi-view clustering algorithms have been proposed that build on spectral clustering and consider a consensus Laplacian matrix among all the views [27, 61, 55]. Instead, deep-learning based multi-view clustering methodologies learn a common encoding with the help of deep neural networks, which can then be leveraged by the clustering module [50, 3]. The clustering module can, among others, be based on deep graph clustering , subspace clustering , adversarial based clustering methods , or contrastive learning .
Iii An Evaluation of ProtoPNet
ProtoPNet introduces self-explanation in a deep learning network by incorporating a prototype layer between the last convolutional layer and the output layer. Thereby, each class is associated with a fixed number of prototypes. The output of the prototype layer is connected linearly to the output layer to generate class logits. The network is optimized by iterating the following three steps: 1) The whole network, except the last layer, is trained using stochastic gradient descent. For each prototype, the squared
similarity between the patches of the convolutional output from the backbone and the prototype is calculated, thus generating an activation map. Global max pooling is applied to the activation map to generate a single similarity score corresponding to a single prototype. The loss used is a combination of the cross entropy loss, a cluster loss and a separation loss. The cluster loss encourages the training images to have a patch close to at least one of their own class prototypes. The separation loss, on the other hand, encourages the training image patches to be far from the prototypes of other classes. For completeness, the losses are provided in the Appendix VII-A. 2) All prototypes are then projected onto the nearest training patch from the same class as the prototype, thus maintaining inherent interpretability. These can be visualised in the input space, thus creating a concept of “this looks like that” while making the decisions. 3) Finally, a convex optimization of the last layer is performed to further improve accuracy, while keeping the learned prototypes fixed. The prototype activations are visualized by upsampling the similarity between the prototypes and the embedded input image to the input image size, thus highlighting the parts of the image which strongly activate the respective prototype.
Iii-B Evaluation of ProtoPNet’s explanations
Although self-explaining models such as ProtoPNet appear promising, as more transparent alternatives to the typical black-box neural networks, we demonstrate that they still lack precision as shown in Figure 1. Moreover, restricting each class representation to a limited number of prototypes, leads to a trade-off between the accuracy of the model and the quality of explanations generated by the model .
In the case of ProtoPNet, even when it visualizes the most important area of the input image for a specific class, it does not concisely depict the relevant features of a prototype as shown by the example in Figure 1. The original image (a) in Figure 1 shows a horse galloping on green grass, and contains a watermark in the lower left corner. Exemplary, the explanation for one of the 10 prototypes, learned for the class horse is shown in Figure 1(b). From this prototype explanation, we can observe that the lower left corner was important for the model to predict the image as a horse. However, the exact pixels, that significantly contributed to the predictions remain unknown, i.e., we do not know whether the grass or the text drove the prediction of the model. Now, using the model-aware PRP method, we backpropagate the prototype information from the prototype layer through the network to the input image, which allows us to reveal and visualize the model-aware, faithfully distributed relevance scores on the input image as shown in Figure 1(c). From the PRP explanation, we observe that high relevance (dark red pixels) was allocated to parts of the text. Thus, the PRP explanation leads to an increased understanding of the underlying behavior of the model by providing the user with a more fine-grained prototype explanation.
For a randomly chosen test image, shown in Figure 1(d), the activation for the learned prototype 1(b) as visualized by ProtoPNet and PRP are given in Figure 1(c) and 1(d), respectively. The PRP explanation identifies the watermark as a relevant feature for predicting the class horse, in contrast to the ProtoPNet explanation, which is too crude to identify important features and is therefore widely spread across the entire image. Now, with these additional insights provided by PRP explanations, we are able to identify parts of the text as Clever Hans features, considering that they are relevant for the prediction of the class horse.
Accordingly, we detect and address the following drawbacks of ProtoPNet:
The activation maps used for the prototype visualizations in ProtoPNet have a very low resolution due to downsampling and feature aggregation functions in the network. From this significantly low resolution activation map, ProtoPNet performs model-agnostic upsampling using bilinear interpolation to the size of the input image, thus leading to verycoarse explanations.
The effective receptive field of a position in the activation map tends to cover large parts of the image, which is not captured by the naive upsampling. Consequently, there is no truthful spatial localization of the relevance to the correct input area, leading to spatially imprecise explanations.
In the next subsection, we will discuss in detail these drawbacks of ProtoPNet’s explanations using the Clever Hans artifact as an example.
Iii-C Case Study: Clever Hans artifact detection with ProtoPNet
Ideally, ProtoPNet should capture any significant artifact in the data as an “artifact prototype”. However, due to its coarse and spatially imprecise explanations, the heatmaps of ProtoPNet exhibit misleading behavior. This hinders the detection of artifact prototypes in ProtoPNet, as shown in detail in this Section.
With the help of the following experiment, we investigate the behavior of ProtoPNet in the presence of Clever Hans artifacts in the data. We aim to detect the aforementioned artifact prototypes using ProtoPNet’s explanations combined with the difference in classification results in the presence and absence of artifacts in the test data. Following this, we prune the detected artifact prototypes, thus hypothetically suppressing the artifacts learnt by the model. However, due to its misleading explanations, we demonstrate experimentally that ProtoPNet’s heatmaps are deficient in capturing and identifying the learned artifact by the model, thus proving the task of pruning artifact prototypes futile for making the model artifact-free.
In order to consider a controlled environment, we use the 5-class version of the LISA traffic sign data set  and place a Clever Hans artifact, a yellow square (see Figure 5), in 100% of the training data of the stop sign class (dataset details are provided in Section V-A), which we refer to as CH-100. We train the ProtoPNet with ResNet34  as backbone, fixing the number of prototypes to 10 for each class following . Note that all training parameters have been set according to 
. The network is trained for 1000 epochs, where a projection (push) of the prototypes is done every 10 epochs. After each push, the last layer is trained for 20 epochs. The learning rate is reduced by a factor of 0.1 every 5 epochs and the training is stopped when the training accuracy converges and the cluster cost becomes smaller than the separation cost on the training set. The described experimental setting was used for all experiments in this work.
To evaluate the impact of an artifact on the model, we evaluate the performance on two test sets: an artifact test data set, where Clever Hans artifacts are inserted into 100% of the images of the stop sign class; and a clean test data set, which contains only clean images, where no artifact has been added. The accuracy results for both test datasets are shown in Table I. We observe that the model, trained on the CH-100 dataset, has 100% classification accuracy on the artifact test data and only 6.3% on the clean test data. This large drop in accuracy indicates that the model has learned the inserted artifact.
In order to detect the prototypes responsible for this behavior, we visualize the 10 prototypes learned by the network for the stop sign class in Figure 2, where the upsampled activation heatmap is overlayed, such that the relevant areas of each prototype can be identified visually. Although no prototype is clearly focusing on the artifact, it appears that prototypes 6 and 8 might be learning a part of the artifact. We further confirm this by measuring the drop in accuracy when removing individual as well as combinations of prototypes for the stop sign class. In Figure 3, the base accuracy is the original accuracy with artifact test data when no prototype is removed, the diagonals represent the drop in accuracy from the base accuracy when pruning single prototypes (1 to 10) of the stop sign class and the non-diagonals represent the drop in accuracy when a combination of prototypes is removed. Note that the accuracy for artifact test data only drops when prototypes 6 or 8 are removed, with the biggest drop of 78.39% when both of these are removed together. Also note that no retraining is done yet after pruning the prototypes.
Therefore, trusting the explanations provided, we might assume that removing these artifact prototypes would eliminate the artifact effect. However, as shown in Table I, this is not the case as seen after retraining the last layer i.e, reweighing the connection of the prototypes to the final classification layer. While the accuracy for the artifact stop sign class drops considerably when removing prototypes 6 and 8, it increases again to 88.8% once the last layer weights are retrained. Moreover, for clean test data, the accuracy remains the same, i.e, 38.2% before and after retraining the last layer, thus refuting the potential learning of meaningful features for the stop sign class by the model after retraining. Therefore, results indicate that without learning new prototypes, the remaining prototypes also include artifact information, highlighting the lack of accurate explanations by ProtoPNet.
Thus, as shown in the above experiment, the explanations provided by the upsampling strategy of ProtoPNet are insufficient in order to reveal the model’s behavior and detect the artifacts faithfully. Targeting more fine grained explanations to overcome this limitation, we present a model-aware prototypical explanation method, which we refer to as PRP.
Iv Prototypical Relevance Propagation and Enhanced Suppression of Artifacts
To address the two main drawbacks of ProtoPNet’s visualizations, i.e., low resolution activation maps and spatially imprecise prototype explanations (as investigated in the section above), we propose a novel method called Prototypical Relevance Propagation. With PRP, we are able to maintain the advantage of the self-explanatory architecture through prototypes and at the same time improve the quality of prototypical explanations by adding the model-aware explanatory potential of PRP.
Iv-a Prototypical Relevance Propagation (PRP)
The original prototype visualization step in ProtoPNet is achieved through upsampling and is therefore decoupled from the other steps in its end-to-end training. Instead of upsampling, we aim to use the knowledge of the inner workings of the network and backpropagate the similarity values of a prototype to the input. By doing this for each prototype, we obtain a model-aware explanation for each prototype. We refer to our method as PRP and the generated explanation maps as PRP maps.
For the following considerations, let the input images be represented as and convolutional output from the backbone CNN as . Let be the prototypes learned by the network, each with a shape of . Following , we set and . Moreover, let be the similarity scores and the activation maps for each prototype. The forward computations in ProtoPNet, illustrated in Figure 4, are defined as follows:
From input to convolutional output:
where the function represents the trained backbone CNN.
The activation maps are computed as squared similarities between the last convolutional output layer and the prototypes in the prototype layer:
where are patches of of the same size as the prototypes and is a small constant introduced for numerical stability.
From activation maps to similarity score:
The similarity scores are the input of the final fully connected layer, which produces the logits for all output classes. Hence, the final classification is based on a linear combination of the similarity scores of the different prototypes.
Now, to obtain more precise prototype visualizations through our approach, a PRP map is calculated for a certain prototype by propagating the relevance of this prototype back to the input features. Note that the relevance of a specific prototype is exactly its similarity score. Therefore, the first backpropagation step considers the redistribution of the similarity scores towards the activation map with respect to the max pooling layer:
From similarity scores to activation map:
LRP for the Max pooling layer is performed as follows:
where refers to the similarity score layer, to the activation map layer and , specify the spatial location in the respective layers. We define the relevance at layer S as .
From activation map relevance to convolutional output: The forward computation as shown in Equation 2 computes the similarity between each prototype and each output patch of the convolutional layer (), with both having channels, thus compressing the channel dimension to 1 in the activation map. In this step, we redistribute the relevance from the one channel activation map back to the channels of the convolutional output, weighted by the corresponding channel-wise similarities computed during the forward pass. We define the channel-wise similarities between each CNN patch and the prototype as:
where, for each channel ,
We then use the rule to distribute relevances to the convolutional output according to :
From convolutional output relevance to input relevance:
The rest of the network follows the LRP CoMPosite () rule  to backpropagate the relevance to the input. In this strategy, is applied to the convolutional layers and is applied to the input layer . The -rule treats positive and negative activations separately as follows:
where is the mapping of the input
from neuronwith weight , , and . We use and . 111Note, for notational simplicity, we follow previous works [6, 26] and consider the convolutional layers as fully-connected layers with shared weights. spreads the relevance to the input features as follows:
where and are the smallest and largest pixel values.
To identify global discriminative features across multiple explanation maps, a method called SpRAy has been published that aims to cluster LRP explanations into their key features. Similar to SpRAy, we want to make use of the PRP maps to identify class specific discriminative features. However, we do not have one but multiple explanations for each image, i.e., the prototype explanations, which can be thought of as multiple views of an image explanation. Thus, unlike SpRAy, which uses one LRP explanation for one image, our proposed method exploits multiple views of an image explanation, i.e., the different prototype explanations.
Iv-B Multi-view Clustering
Interpreting the different prototype activations as various views of the same image, allows us to compare/cluster the prototype activations with multi-view clustering algorithms in order to detect global class-discriminative features in the data. Hence, in the following, we apply multi-view clustering algorithms to the PRP maps to cluster the data into artifact and non-artifact images. Since a variation in clustering results can be observed using different methodologies, we demonstrate the performance with several multi-view clustering algorithms. We therefore include a few representative spectral multi-view clustering algorithms consisting of a two-step unweighted method , and a weighted method  both of which compute the Laplacian matrix and cluster assignments in two separate steps. Further, we compare results with a one-step method based on a rank constraint , which computes the similarities as well as cluster labels in one step, and with two recent deep learning based clustering methods .
The spectral multi-view clustering methods work on the general principle of computing a consensus Laplacian matrix among all views. Co-Reg (
) works by co-regularizing the clustering hypotheses. They obtain the combined Laplacian matrix by regularizing eigenvectors of the Laplacians through two schemes: 1) pairwise co-regularization, where they encourage the pairwise similarities across all views to be high and 2) centroid-based co-regularization, where they encourage each view to be closer to a common centroid. Weighted multi-view spectral clustering (WMSC), on the other hand, proposes a weighting scheme based on minimizing the largest canonical angle between the subspace spanned by each view’s and consensus’s eigenvectors, followed by using cluster ability smoothness to assign similar weights to views with similar clustering results. Multiview Consensus Graph Clustering (MCGC) 
proposes to learn the consensus graph using a cost function based on disagreement between individual and global views accompanied by rank constraint on the Laplacian matrix to directly get the clustering results without using k-means for instance.
The deep multi-view clustering in  first transforms each input (, image with view ) into its representation using view-specific encoders , as . The fused representation for all views is then computed using the fusion weights , which are also learned during the end-to-end training, as , where are total number of views. This representation is then passed through a fully connected network to obtain the final cluster assignments. Deep divergence based clustering (DDC)  losses are incorporated to optimize the model. This approach is termed as Simple Multi-View Clustering (SiMVC) by the authors, which, as the name suggests, is simple and efficient and works without explicit representation alignment. Further,  also introduce an auxiliary method which incorporates selective contrastive alignment of representations called Contrastive Multi-View Clustering (CoMVC).
V Experiments & Results
In this section, we explore both the Clever Hans and Backdoor artifact settings using the LISA traffic sign dataset . This dataset consists of video frames captured from a driving car. We follow the strategy of , where we extract the frames and resize them to 224x224 to be compatible with the original ProtoPNet architecture. The 47 classes in the dataset are then partitioned into 5 high-level classes, as proposed by , consisting of restriction, speed limits, stop, warning, and yield signs (details provided in Appendix VII-C). In addition, we use the PASCAL VOC 2007 dataset  for evaluation as it naturally contains a Clever Hans artifact. 222Since in PASCAL VOC 2007, one image can belong to several classes, we deliberately remove the person class from this dataset to decrease ambiguity. The person images overlap to a large extent with the images of the other classes , leading to a lot of duplicate images in multiple classes.
V-A1 Clever Hans
We place the artifact resembling a yellow post-it note, as shown in Figure 5, in 100%, 50% and 20% of the stop sign images in the training data of the LISA traffic sign dataset to create the CH-100, CH-50 and CH-20 Clever Hans training datasets, respectively. We do not add Clever Hans artifacts to the PASCAL VOC 2007 dataset since it inherently includes a watermark tag of the photographer in about 15-20% of the images in the horse class .
According to the data manipulation scheme for backdoor attacks from  we insert the artifact, i.e., the yellow post-it, as shown in Figure 5, in 15% of the stop sign images and assign them to the speed limit class. We refer to this corrupted training dataset as BD-15.
In order to create both, an artifact and a non-artifact i.e., a clean test dataset of the LISA traffic sign dataset, we insert the artifact in either 100% or 0% of the stop sign images, referred as Artifact Test and Clean Test data, respectively. These test datasets are used for evaluating our experiments on the Clever Hans (CH-100, CH-50 and CH-20) as well as the Backdoor (BD-15) scenarios.
V-B PRP maps vs ProtoPNet heatmaps
In the following, we conduct an experiment, where we add a Clever Hans feature to the training dataset to investigate the difference between the heatmaps of ProtoPNet and the ones that PRP generates. Therefore, we add the Clever Hans artifact to 50% of the stop sign images in the training data (CH-50). The 10 prototypes for the stop sign class, learned by the ProtoPNet trained on the manipulated dataset, are shown in the first row of Figure 6. Given a test image, shown at the very left of Figure 6, the heatmaps of ProtoPNet and the PRP heatmaps for the image are shown in the middle and bottom row of Figure 6. We can observe that the ProtoPNet heatmaps are coarse, highlighting wider areas in the test image, and that neighboring regions of the artifact are focused upon, rather than the precise location of the artifact. In contrast, from the PRP maps, we can clearly observe that all prototypes are focusing precisely on the Clever Hans feature, some more (prototypes 2, 3, 4, 5, 7, 9, 10) and some less (prototypes 1, 6, 8). It is shown later that prototypes 6 and 8 are in fact not learning any significant features and even react strongly to random noise. With the new insight into the model behavior gained through the PRP maps, we can shed new light on the hypothesis from Section III-C. The idea was to remove the prototypes that had learned the Clever Hans, retrain the last layer and thus eliminate the Clever Hans effect. Given the original prototype explanation, this made sense, as only 2 of the 10 prototypes had learned the Clever Hans feature. With the PRP maps, however, we gain new knowledge and can see that all prototypes (some more, some less) take into account the Clever Hans feature, the yellow square.
We also note here that ProtoPNet heatmaps are highlighting all pixels in the image activated by different prototypes (before Max Pooling). If they were highlighting only the maximally activated region (after Max Pooling), it would only be able to depict connected regions in the image space, considering the naive upsampling heavily based on spatial location correspondence between the activation map and the input image. On the other hand, PRP maps represent the maximally activated pixels and are still able to highlight disjointed areas in the image, as can be seen in the PRP map for Prototype 5 in Figure 6, where both the artifact and “ST” in the stop sign are shown as relevant.
Figure 7 illustrates the difference between PRP maps and ProtoPNet heatmaps for a stop sign image with no artifact. PRP maps, as shown in the bottom row, are of higher resolution and, as noticed in this case, tend to show more accurate information than the normal upsampled heatmaps from ProtoPNet. PRP maps also contain higher variability, as shown by explanations for Prototype 2 and 4 in Figure 7, which therefore yields more information from the original prototypes to explain the test pattern.
We now quantitatively evaluate the faithfulness of the PRP maps and ProtoPNet heatmaps regarding their ability to capture the most discriminative class-wise information. For this, we follow the strategy presented in , referred to as the Relevance ordering test, where we start from a random image and monitor both the similarity and class scores as we gradually add the most relevant pixels to the image.
Primarily, we are interested in investigating if the most relevant pixels, according to the ProtoPNet heatmap and PRP map, activate the prototype the most. Hence, we are interested in measuring the similarity score between the activation map and prototype instead of the prediction value. For this, first, for an input image, PRP map and ProtoPNet heatmaps are computed, followed by sorting the pixels in descending order of their assigned relevance by PRP and ProtoPNet explanations, respectively. We then compute the similarity scores for different prototypes of the stop sign images while gradually adding the sorted pixels to a random image. We compute this for 50 randomly chosen clean images from the stop sign class and compute the average across all images followed by an average over all prototypes. The same experiment is repeated with the same images, this time adding the Clever Hans artifact. The average results for all prototypes of the stop sign class are shown in Figure 8. The x-axis represents the percentage of pixels that are replaced by the relevant pixels of the test image and the y-axis represents the corresponding similarity scores. As a baseline, we start from a random image and gradually replace a percentage of randomly chosen pixels by their test image pixel values and refer to this as the Random approach. From Figure 8 we can observe that for both test case scenarios, i.e, the stop sign images with and without the artifact, PRP is able to capture more relevant information. These quantitative results also uncover ineffective prototypes which are not learning anything specific from the training images and are reacting very highly even to random noise, as shown in Figure 9. This behavior is observed in both test scenarios of clean and artifact test data, with the results depicted for artifact test images in Figure 9 for prototypes 6 and 8.
Until now, we quantitatively examined the faithfulness of PRP maps with respect to different prototypes in terms of the similarity score. Additionally, we are interested in the faithfulness of the explanations when using the classification scores instead of the similarity scores. Hence, for the same PRP and ProtoPNet explanation maps, we sort the pixels in descending order of relevance and gradually add them to the random images while monitoring the effect on the class score for the stop sign class as shown in Figure 10. We can observe from Figure 10 that adding the most relevant pixels, based on the PRP explanations, results in a significantly steeper slope (orange line) than using the ProtoPNet heatmaps for both artifact (shown on the right) and clean dataset (shown on the left). Therefore, conclusively, we can state that the relevance of the important discriminate features distributed by PRP is more accurate than by ProtoPNet explanations.
V-C Assessing the network behavior with PRP maps
So far, we have established the drawbacks of ProtoPNet, which are the lack of higher resolution and spatially precise explanations which hinder the user in identifying the most relevant discriminative features. Accordingly, we proposed a method to overcome this lack of precise explanations. Our proposed PRP maps provide a higher level of fine grained explanations while keeping the benefit of “this-looks-like-that” behavior of the ProtoPNet, as shown in Figure 11 for both LISA (CH-50) and PASCAL VOC 2007 datasets. Therefore, we still have inherent interpretability, where each class is being represented by a fixed number of prototypes. This exponentially reduces the need for the manual laborious task of analysing individual ad-hoc explainability heatmaps for assessing deep neural networks. Additionally, this also reduces the need to use semi-automated methodologies like SpRAy  to find patterns in a model’s explanations with a huge number of explanation maps.
We can now directly visually identify the strategies learned by the network by only looking at a few representative prototypes for each class. For instance, we manually cluster the PRP maps of the stop sign class for the LISA dataset, as shown in Figure 12. We can observe, that aside from learning the artifact, the network is also relying on the textual part of the stop signs as well as on the corner features. Note, that we have excluded prototypes 6 and 8 from the assessment since they did not capture any useful information (see Figure 9).
Following this, we investigate the performance of PRP and ProtoPNet explanations on the PASCAL VOC 2007 dataset in order to uncover relevant features learned by the networks for predicting the class horse. First, we show a few prototypes (top 4 activated) that were learned by the model for the horse class along with their ProtoPNet heatmaps and PRP Maps, shown in Figure 13. Here, we can observe that PRP explanations capture the relevant features in a more fine grained manner and are able to identify a Clever Hans strategy used by the model where it tends to focus on the text in the watermark in prototype 3, rather than on the horse. In contrast, the information in ProtoPNet’s heatmaps in the second row of Figure 13 is ambiguous since prototype 3 is allocating relevance to a broader background area. The strategies learnt by the network for recognizing a horse are grouped manually and visualized in Figure 14. The four effective groups, disregarding the insignificant gray cluster which focuses on the background features, represent the horse class in terms of a horse’s face, legs, presence of a rider, and finally the Clever Hans watermark.
V-D Multi-View Clustering for suppressing artifacts
Artifacts in the data can be learned by the model, which subsequently might lead to the model exhibiting undesirable behavior, as shown in recent works  and demonstrated above in case of the self-explaining network, ProtoPNet. Thus, it is essential to either remove the artifacts from the data, or to ensure that the model is not basing its decisions on these spurious attributes present in the data. We tried the latter in the experiments above via identifying and removing the artifact prototypes. However, as we observed, this is not possible since the artifact is not always perceivable by the ProtoPNet heatmaps even if the artifact was learned by a particular prototype. Using our suggested method, we are now able to find the prototypes that are activated by the artifact. It was further discovered using PRP in the previous sections, that almost all the prototypes incorporate the artifact features, thus suggesting the entanglement of the artifact information within the whole network. Therefore, instead of pruning the artifact prototypes, we propose to detect the samples in the training dataset that activate the artifact prototypes, which can be subsequently removed from the training data set before retraining the ProtoPNet on the cleansed dataset.
Using PRP, we obtain PRP maps corresponding to the artifact-containing class for each image, where corresponds to the number of learned prototypes for that class. We can consider these PRP maps as different views of the same image and can thus build on existing multi-view clustering methodologies to automatically cluster the training images and thereby discover clusters corresponding to artifact-containing images. In this work, we cluster the images into 2 clusters, an artifact and a clean data cluster.
To demonstrate the efficiency of PRP in detecting artifacts in the data, we test different multi-view clustering methodologies on the LISA dataset with 50% and 20% Clever Hans features added to the stop sign images. We further use the same methodologies for backdoor detection thereby demonstrating PRP’s efficiency in multiple artifact scenarios. We also compare our clustering approach with SpRAy, which performs spectral clustering analysis on single view LRP maps, and demonstrate that our approach is able to capture better information in PRP maps, especially in the setting with multiple views.
V-D1 Clever Hans type artifacts in 50% training data
The accuracy for CH-50 for the artifacts in the stop sign class in 100% (artifact test) and 0% (clean test) data is shown in Table I. As can be seen, the accuracy for the stop sign class drops from 100% to 94.6% when there is no artifact in the test data. From Figure 6, prototypes 4 and 9 can be considered as “artifact” prototypes according to ProtoPNet heatmaps. But as can be seen in Table I, there is no effect on the artifact test accuracy when removing these prototypes. The same is true when removing the prototypes followed by retraining of the model. On the contrary, a decrease in the accuracy for the clean test data is observed. This again supports our assertion of misleading information provided by ProtoPNet’s heatmaps.
In order to obtain a clean data set, we aim to identify the samples that contain an artifact in the first place in order to remove them from the training set. Assuming that the information on whether an artifact is present in a data point is recognizable in the PRP maps, we cluster the PRP maps in two clusters. For comparison, we use a set of representative algorithms to cluster the data, including SpRAy , SiMVC , CoMVC , Co-Reg , WMSC  and MCGC . We downsample the heatmaps to a size of 80x80, as this had negligible impact on the results and led to reduced computation time.
The overall accuracy of the clustering and F1-score for the artifact cluster are given in Table II. We follow the experiments in  and train the SiMVC and CoMVC models for 100 epochs for 20 runs and report the results from the run resulting in the lowest unsupervised cost-function value.
As observed from Table II, both SiMVC and CoMVC are working very efficiently to separate the artifact images from the clean images. We also report the results for multi-view spectral clustering algorithms, i.e, Co-Reg, WMSC and MCGC in Table II. Although being more computationally expensive, these algorithms are able to cluster the data effectively. Co-Reg and WMSC always obtain an accuracy of above 94% in separating the artifact data, and thus prove to be highly successful in detecting the artifacts. SiMVC and CoMVC on the other hand perform with almost 100% accuracy when the artifact and non-artifact classes are balanced, i.e, in the current setting of CH-50.
To compare against the multi-view clustering approaches, we apply SpRAy , on the LRP maps for the true class (SpRAy-LRP) as well as PRP maps for the prototypes of the true class (SpRAy-PRP). For SpRAy-LRP, we compute LRP maps using the rules in Section IV-A, followed by for the last layer and a combination of relevance for all prototypes. More details are provided in the Appendix VII-B. Accordingly, we get one LRP map for each image, which is scaled down to 80x80 and flattened before applying SpRAy. For SpRAy-PRP, we combine the PRP map images by summing them across the channels and concatenating all 10 PRP maps for each image to get a 10x80x80 map. We then flatten it and apply SpRAy.
The results for both are shown in Table II. As observed, SpRAy fails in clustering the artifacts in CH-50 data using both LRP and concatenation of PRP maps. This behavior is expected since both SpRAy-LRP and SpRAy-PRP do not capture dependencies among multiple views of the same objects as opposed to other multi-view clustering methodologies. We also apply SpRAy on the individual prototype’s PRP maps and report the best single-prototype performance in Table III, which is achieved by prototype 7. The obtained accuracy of 53.80% and F1-score of 0.68, when compared to the combined SpRAy-PRP accuracy of 59.12% and F1-score of 0.68, thus demonstrates that no individual prototype allows SpRAy to separate the artifact from the clean data.
V-D2 Clever Hans type artifact in 20% training data
We also want to capture the scenarios when less Clever Hans artifacts are included in the training data. Therefore, we evaluate the efficiency of multi-view clustering methodologies on the unbalanced dataset CH-20. The stop sign class accuracy for artifact and clean test data is 99.7% and 95.8%, respectively. This depicts that the stop sign class is still affected by the Clever Hans effect.
Applying the multi-view clustering methodologies to this scenario, we report the accuracy and F1-score in Table II. Results show that SiMVC is performing best with 97.99% accuracy, with comparable performance by almost all the other multi-view clustering methods. SpRAy fails again with a very low F1-scores of 0.04 and 0.08 on LRP and PRP maps, thus suggesting clustering of almost all images into one cluster.
V-D3 Backdoor type artifact in 15% training data
Similar to the experiments above, we examine the backdoor setting, using the generated BD-15 dataset. The prototypes and their corresponding heatmaps for the speed limit class are shown in Figure 15. The test accuracy when the artifact is present in 100% of the stop sign test images is given in Table IV. Most of the stop sign images are now classified as speed limits and only 1% of the stop sign images are classified correctly.
The prototypes of the speed limit class, as learned by ProtoPNet, show that only one prototype has learned the backdoor artifact, while all the remaining 9 prototypes correspond to the speed limit class, as shown in Figure 15. As per ProtoPNet’s explanations, removing prototype 4 of the speed limit class should solve the problem of backdoor attacks. We remove the prototype and retrain the last layer and report the accuracies in Table IV.
We can observe that removing the backdoor prototype has only a minor effect on the accuracy of the stop sign class, which increased from 1.0% to 6.5%. However, after retraining the last layer it again drops to only 2.5%. This behaviour of the network thus emphasizes the inherent learning of the backdoor artifact by the network, which is not limited to only learning a specific backdoor prototype, as incorrectly suggested by ProtoPNet visualizations. Here, too, the PRP explanations decode the behavior of the model - they show that almost all prototypes are activated by the artifact, even if these prototypes refer to the speed limit signs.
We therefore use multi-view clustering to clean the data of the backdoor feature and report the results in Table II. SiMVC and CoMVC are still performing better than SpRAy-PRP with F1-scores of 0.60 and 0.57 respectively, as opposed to 0.02 F1-score of SpRAy-PRP. Although, SpRAy-LRP is performing good in this setting with a F1-score of 0.91. This is because LRP maps consist of negative relevances from the stop sign class in addition to the positive relevances from the speed limit class. This helps in accentuating the difference between speed limit and backdoor stop sign images. Furthermore, all the multi-view spectral clustering-based algorithms are able to separate these clusters efficiently, with the best being Co-Reg with an accuracy of 99.42% and a F1-score of 0.98.
Considering the success of machine learning algorithms in diverse safety-critical applications, it is instrumental to verify the behavior of these models. In this work, we assess the faithfulness of the explanations provided by a well known self-explainable network, ProtoPNet, and provide an in-depth assessment of its behavior in the presence of a range of artifacts. Our results indicate that, despite the attractiveness of self-explaining models, they are still very far from achieving the required quality of explanations. Considering this, we propose a model-aware method, PRP, to generate more precise and higher resolution prototypical explanations. These enhanced explanations help in uncovering more credible decision strategies, while keeping the self-explainability intact. We further show that these explanations are able to uncover the spurious artifact features learned by the model, which are then efficiently identified and removed via our proposed multi-view clustering strategy. The insights obtained in this work highlight the importance of evaluating the quality of self-explaining machine learning approaches and will pave the way towards the development of more robust and precise models, thereby increasing their trustworthiness.
-  (2018) Towards robust interpretability with self-explaining neural networks. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31, pp. . External Links: Cited by: §II-A.
-  (2019-12) . pp. . Cited by: §I, §I, §II-B1.
-  (2013) Deep canonical correlation analysis. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, pp. III–1247–III–1255. Cited by: §II-C.
-  (2020) Self-organizing subspace clustering for high-dimensional and multi-view data. Neural Networks 130, pp. 253–268. External Links: Cited by: §II-C.
-  (2015) Multiple object recognition with visual attention. External Links: Cited by: §II-A.
-  (2015-07) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE 10 (7), pp. 1–46. External Links: Cited by: §I, §I, §II-A, footnote 1.
-  (2010) How to explain individual classification decisions. Journal of Machine Learning Research 11 (61), pp. 1803–1831. External Links: Cited by: §II-A.
-  (2019) EDUCE: explaining model decisions through unsupervised concepts extraction. CoRR abs/1905.11852. External Links: Cited by: §II-A.
-  (2020) How much can i trust you? – quantifying uncertainties in explaining neural networks. External Links: Cited by: §II-A.
-  (2009) Multi-view clustering via canonical correlation analysis. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, New York, NY, USA, pp. 129–136. External Links: Cited by: §II-C.
-  (2018) Detecting backdoor attacks on deep neural networks by activation clustering. CoRR abs/1811.03728. External Links: Cited by: §I, §II-B2, §II-B, §V-A2, §V-A.
-  (2019) This looks like that: deep learning for interpretable image recognition. In Proceedings of Neural Information Processing Systems (NeurIPS), Cited by: §I, §II-A, §III-A, §III-B, §III-C, §IV-A.
-  (2020-12) Concept whitening for interpretable image recognition. Nature Machine Intelligence 2 (12), pp. 772–782. External Links: Cited by: §II-A.
-  (2019) Tensor-based low-dimensional representation learning for multi-view clustering. IEEE Transactions on Image Processing 28 (5), pp. 2399–2414. External Links: Cited by: §II-C.
Saccader: improving accuracy of hard attention models for vision. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. . External Links: Cited by: §II-A.
-  The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Note: http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html Cited by: Fig. 1, §V-A.
-  (2017) Robust physical-world attacks on machine learning models. CoRR abs/1707.08945. External Links: Cited by: §II-B2.
-  (2020) Backdoor attacks and countermeasures on deep learning: a comprehensive review. External Links: Cited by: §II-B2, §II-B.
-  (2016) Deep residual learning for image recognition. In , Vol. , pp. 770–778. External Links: Cited by: §III-C.
-  (2020) Auto-weighted multi-view clustering via deep matrix decomposition. Pattern Recognition 97, pp. 107015. External Links: Cited by: §II-C.
-  (2019) Deep divergence-based approach to clustering. Neural Networks 113, pp. 91–101. External Links: Cited by: §IV-B.
-  (2019-06) From Clustering to Cluster Explanations via Neural Networks. arXiv e-prints, pp. arXiv:1906.07633. External Links: Cited by: §II-A.
-  (2020) Towards explaining anomalies: a deep taylor decomposition of one-class models. Pattern Recognition 101, pp. 107198. External Links: Cited by: §II-A.
-  (2021) XProtoNet: diagnosis in chest radiography with global and local explanations. External Links: Cited by: §I, §II-A.
-  (2017) Learning how to explain neural networks: patternnet and patternattribution. External Links: Cited by: §II-A.
-  (2020) Towards best practice in explaining neural network decisions with lrp. In 2020 International Joint Conference on Neural Networks (IJCNN), Vol. , pp. 1–7. External Links: Cited by: §I, item 3, §VII-B, footnote 1.
-  (2011) Co-regularized multi-view spectral clustering. In Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K. Q. Weinberger (Eds.), Vol. 24, pp. . External Links: Cited by: §II-C, §IV-B, §IV-B, §V-D1, TABLE II.
-  Unmasking clever hans predictors and assessing what machines really learn. Nature Communications 10 (1) (English). External Links: Cited by: §I, §I, §II-B1, §II-B, §V-A1, §V-C, §V-D1, §V-D, TABLE II.
-  (2019) Unmasking clever hans predictors and assessing what machines really learn. CoRR abs/1902.10178. External Links: Cited by: §I, §V-D1.
Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions.
Proceedings of the AAAI Conference on Artificial Intelligence32 (1). External Links: Cited by: §II-A.
-  (2019) AOGNets: compositional grammatical architectures for deep learning. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), pp. 6213–23. External Links: Cited by: §II-A.
-  (2017) A unified approach to interpreting model predictions. CoRR abs/1705.07874. External Links: Cited by: §II-A.
-  (2019) A rate-distortion framework for explaining neural network decisions. CoRR abs/1905.11092. External Links: Cited by: §II-A, §V-B.
-  (2014) Recurrent models of visual attention. CoRR abs/1406.6247. External Links: Cited by: §II-A.
-  (2012) Vision-based traffic sign detection and analysis for intelligent driver assistance systems: perspectives and survey. IEEE Transactions on Intelligent Transportation Systems 13 (4), pp. 1484–1497. External Links: Cited by: §III-C, §V-A.
-  (2017) Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognition 65, pp. 211–222. External Links: Cited by: item 3.
-  (2018) Methods for interpreting and understanding deep neural networks. Digital Signal Processing 73, pp. 1–15. External Links: Cited by: §II-A.
-  (2016) ”Why should I trust you?”: explaining the predictions of any classifier. CoRR abs/1602.04938. External Links: Cited by: §II-A.
-  (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1 (5), pp. 206–215. Cited by: §I, §I, §II-B.
-  (2020-03) Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. arXiv e-prints, pp. arXiv:2003.07631. External Links: Cited by: §II-A.
-  (2019) Classification-by-components: probabilistic modeling of reasoning over a set of components. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, pp. . External Links: Cited by: §II-A.
-  (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV), Vol. , pp. 618–626. External Links: Cited by: §II-A.
-  (2015) Attention for fine-grained categorization. External Links: Cited by: §II-A.
-  (2014) Deep inside convolutional networks: visualising image classification models and saliency maps. External Links: Cited by: §II-A.
-  (2019) When explanations lie: why modified BP attribution fails. CoRR abs/1912.09818. External Links: Cited by: §I.
-  (2020) Explanation-guided training for cross-domain few-shot classification. External Links: Cited by: §II-A.
-  (2020) Marginalized multiview ensemble clustering. IEEE Transactions on Neural Networks and Learning Systems 31 (2), pp. 600–611. External Links: Cited by: §II-C.
-  (2021-06) Reconsidering representation alignment for multi-view clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1255–1265. Cited by: §II-C, §IV-B, §IV-B, §V-D1, §V-D1, TABLE II.
-  (2016) Feature importance measure for non-linear learning algorithms. External Links: Cited by: §II-A.
-  (2015) On deep multi-view representation learning. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, pp. 1083–1092. Cited by: §II-C.
-  (2019) Towards interpretable object detection by unfolding latent structures. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Vol. , pp. 6032–6042. External Links: Cited by: §II-A.
-  (2015) Understanding neural networks through deep visualization. CoRR abs/1506.06579. External Links: Cited by: §II-A.
-  (2018) Confounding variables can degrade generalization performance of radiological deep learning models. CoRR abs/1807.00431. External Links: Cited by: §II-B1.
-  (2013) Visualizing and understanding convolutional networks. CoRR abs/1311.2901. External Links: Cited by: §II-A.
-  (2019) Multiview consensus graph clustering. IEEE Transactions on Image Processing 28 (3), pp. 1261–1270. External Links: Cited by: §II-C, §IV-B, §IV-B, §V-D1, TABLE II.
-  (2018) Interpretable convolutional neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 8827–8836. External Links: Cited by: §II-A.
-  (2019) Unsupervised learning of neural networks to explain neural networks (extended abstract). CoRR abs/1901.07538. External Links: Cited by: §II-A.
-  (2017) Multi-view clustering via deep matrix factorization. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, pp. 2921–2927. Cited by: §II-C.
-  (2019) Analyzing the interpretability robustness of self-explaining models. CoRR abs/1905.12429. External Links: Cited by: §I.
-  (2020) End-to-end adversarial-attention network for multi-modal clustering. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 14607–14616. External Links: Cited by: §II-C.
-  (2018) Weighted multi-view spectral clustering based on spectral perturbation. In AAAI, Cited by: §II-C, §IV-B, §IV-B, §V-D1, TABLE II.
Vii-a ProtoPNet: Cost function
The overall cost function for ProtoPNet is:
is the cross entropy (CrsEnt) loss, is the cluster loss and is the separation loss, defined as:
where are the total number of training images, is the true label for image , is the predicted label, represents the learnable parameters of the whole network, are all the prototypes belonging to class and are the patches of the convolutional output which are of the same size as the prototypes.
For SpRAy based on LRP maps, we first backpropagate the output relevances i.e, class scores to the similarity score layer. We follow the rule and use the rule :
For the rest of the network, the rules for PRP are used. Considering that we are now computing relevance corresponding to all the prototypes, we combine them to get the relevance at layer as:
Vii-C LISA 5 class dataset