Concept-level Debugging of Part-Prototype Networks

by   Andrea Bontempelli, et al.
Università di Trento

Part-prototype Networks (ProtoPNets) are concept-based classifiers designed to achieve the same performance as black-box models without compromising transparency. ProtoPNets compute predictions based on similarity to class-specific part-prototypes learned to recognize parts of training examples, making it easy to faithfully determine what examples are responsible for any target prediction and why. However, like other models, they are prone to picking up confounds and shortcuts from the data, thus suffering from compromised prediction accuracy and limited generalization. We propose ProtoPDebug, an effective concept-level debugger for ProtoPNets in which a human supervisor, guided by the model's explanations, supplies feedback in the form of what part-prototypes must be forgotten or kept, and the model is fine-tuned to align with this supervision. An extensive empirical evaluation on synthetic and real-world data shows that ProtoPDebug outperforms state-of-the-art debuggers for a fraction of the annotation cost.


page 8

page 9

page 16

page 17


ConceptDistil: Model-Agnostic Distillation of Concept Explanations

Concept-based explanations aims to fill the model interpretability gap f...

ProtoTEx: Explaining Model Decisions with Prototype Tensors

We present ProtoTEx, a novel white-box NLP classification architecture b...

Enhancing Transparency of Black-box Soft-margin SVM by Integrating Data-based Prior Information

The lack of transparency often makes the black-box models difficult to b...

MEGA: Model Stealing via Collaborative Generator-Substitute Networks

Deep machine learning models are increasingly deployedin the wild for pr...

Learning Post-Hoc Causal Explanations for Recommendation

State-of-the-art recommender systems have the ability to generate high-q...

Learnable Visual Words for Interpretable Image Recognition

To interpret deep models' predictions, attention-based visual cues are w...

Toward a Unified Framework for Debugging Gray-box Models

We are concerned with debugging concept-based gray-box models (GBMs). Th...

1 Introduction

Part-Prototype Networks, aka

ProtoPNets, are “gray-box” image classifiers that combine the transparency of case-based reasoning with the flexibility of black-box neural networks 

(Chen et al., 2019). They compute predictions by first matching the input image with a set of learned part-prototypes – that is, prototypes capturing task-salient elements of the training images, like objects or parts thereof – and then making a decision based on the part-prototype activations only. What makes ProtoPNets appealing is that, despite performing comparably to more opaque predictors, they explain their own predictions in terms of relevant part-prototypes and of training examples that these are sourced from. Moreover, these explanations are – by design – more faithful than those extracted by post-hoc approaches (Dombrowski et al., 2019; Teso, 2019; Lakkaraju and Bastani, 2020; Sixt et al., 2020) and were shown to effectively help stakeholders to simulate and anticipate the model’s reasoning (Hase and Bansal, 2020).

Despite all these advantages, ProtoPNets are prone – like regular neural networks – to picking up confounds from the training data (e.g., class-correlated watermarks), thus suffering from compromised generalization and out-of-distribution performance (Lapuschkin et al., 2019; Geirhos et al., 2020). This occurs even with well-known data sets, as we will show, and it is especially alarming as it can impact high-stakes applications like COVID-19 diagnosis (DeGrave et al., 2021) and scientific analysis (Schramowski et al., 2020).

We tackle this issue by introducing ProtoPDebug, a simple but effective debugger for ProtoPNets that leverages their case-based nature and that is suitable for interactive usage. ProtoPDebug builds on three key observations: (i) In ProtoPNets, confounds – for instance, textual meta-data in X-ray lung scans (DeGrave et al., 2021) and irrelevant patches of background sky, sea, or foliage (Xiao et al., 2020) – end up appearing as part-prototypes; (ii) It is easy for (sufficiently expert and motivated) users to indicate which part-prototypes are confounded by inspecting the model’s explanations. (iii) Concept-level feedback of this kind is context-independent, and as such it generalizes across instances.

In short, ProtoPDebug leverages the explanations naturally output by ProtoPNets to acquire concept-level feedback about confounded (and optionally high-quality) part-prototypes, as illustrated in Fig. 1. Then, it aligns the model using a novel pair of losses that penalize part-prototypes for behaving similarly to confounded concepts, while encouraging the model to remember high-quality concepts, if any. ProtoPDebug is ideally suited for human-in-the-loop explanation-based debugging (Kulesza et al., 2015; Teso and Kersting, 2019), and achieves substantial savings in terms of annotation cost compared to alternatives based on input-level feedback (Barnett et al., 2021). In fact, in contrast to the per-pixel relevance masks used by other debugging strategies (Ross et al., 2017; Teso and Kersting, 2019; Plumb et al., 2020; Barnett et al., 2021), concept-level feedback automatically generalizes across instances, thus speeding up convergence and preventing relapse. Our experiments show that ProtoPDebug is effective at correcting existing bugs and at preventing new ones on both synthetic and real-world data, and that it needs less corrective supervision to do so than state-of-the-art alternatives.

Contributions. Summarizing, we: (1) Highlight limitations of existing debuggers for black-box models and ProtoPNets; (2) Introduce ProtoPDebug, a simple but effective strategy for debugging ProtoPNets that drives the model away from using confounded concepts and prevents forgetting well-behaved concepts; (3) Present an extensive empirical evaluation showcasing the potential of ProtoPDebug on both synthetic and real-world data sets.

2 Part-Prototype Networks

ProtoPNets (Chen et al., 2019) classify images into one of classes using a three-stage process comprising an embedding stage, a part-prototype stage, and an aggregation stage; see Fig. 1 (left).

Figure 1: Left: architecture of ProtoPNets. Right: schematic illustration of the ProtoPDebug loop. The model has acquired a confounded part-prototype (the blue square “”) that correlates with, but is not truly causal for, the Crested Auklet class, and hence mispredicts both unconfounded images of this class and confounded images of other classes (top row). Upon inspection, an end-user forbids the model to learn part-prototypes similar to , achieving improved generalization (bottom row). Relevance of all part-prototypes is omitted for readability but assumed positive.

Embedding stage: Let be an image of shape , where is the number of channels. The embedding stage passes through a sequence of (usually pre-trained) convolutional and pooling layers with parameters , obtaining a latent representation of shape , where and . Let be the set of subtensors of . Each such subtensor encodes a filter in latent space and maps a rectangular region of the input image .

Part-prototype stage: This stage memorizes and uses part-prototypes . Each

is a tensor of shape

explicitly learned – as explained below – so as to capture salient visual concepts appearing in the training images, like heads or wings. The activation of a part-prototype on a part is computed using a difference-of-logarithms (dol) activation, defined as (Chen et al., 2019):

Here, indicates the norm and is a small constant. Alternatively, one can employ an exponential activation, defined as (Hase and Bansal, 2020):

Figure 2:

Part-prototype activation functions.

Both activation functions are bell-shaped and decrease monotonically with the distance between and , as illustrated in Fig. 2. The image-wide activation of

is obtained via max-pooling, i.e.,

. Given an input image

, this stage outputs an activation vector that captures how much each part-prototype activates on the image:

The activation vector essentially encodes the image according to the concept vocabulary. Each class is assigned discriminative part-prototypes learned so as to strongly activate (only) on training examples of that class, as described below.

Aggregation stage: In this stage, the network computes the score of class by aggregating the concept activations using a dense layer, i.e., , where is the weight vector of class

, and applies a softmax function to obtain conditional probabilities

. The set of parameters appearing in the ProtoPNet will be denoted by .

Loss and training. The network is fit by minimizing a compound empirical loss over a data set :

This comprises a cross-entropy loss (first term) and two regularization terms that encourage the part-prototypes to cluster the training set in a discriminative manner:

Specifically, the part-prototypes are driven by the clustering loss to cover all examples of their associated class and by the separation loss not to activate on examples of other classes.

During training, the embedding and part-prototype layers are fit jointly, so as to encourage the data to be embedded in a way that facilitates clustering. At this time, the aggregation weights are fixed: each weight of is set to if and to

otherwise. The aggregation layer is fit in a second step by solving a logistic regression problem. ProtoPNets guarantee the part-prototypes to map to concrete cases by periodically projecting them onto the data. Specifically, each

is replaced with the (embedding of the) closest image part from any training example of class .

Explanations. The architecture of ProtoPNets makes it straightforward to extract explanations highlighting, for each part-prototype : (1) its relevance for the target decision , given by the score ; (2) its attribution map , obtained by measuring the activation of on each part of , and then upscaling the resulting matrix to using bilinear filtering:

(3) the source example that it projects onto.

3 Input-level Debugging Strategies and Their Limitations

Explanations excel at exposing confounds picked up by models from data (Lapuschkin et al., 2019; Geirhos et al., 2020)

, hence constraining or supervising them can effectively dissuade the model from acquiring those confounds. This observation lies at the heart of recent approaches for debugging machine learning models.

A general recipe. The general strategy is well illustrated by the right for the right reasons (RRR) loss (Ross et al., 2017), which aligns the attribution maps produced by a predictor to those supplied by a human annotator. Let be a differentiable classifier and be the input gradient of for a decision . This attribution algorithm (Baehrens et al., 2010; Simonyan et al., 2014) assigns relevance to each for the given decision based on the magnitude of the gradient w.r.t. of the predicted probability of class . The RRR loss penalizes for associating non-zero relevance to known irrelevant input variables:

Here, is the ground-truth attribution mask of example , i.e., is if the -th pixel of is irrelevant for predicting the ground-truth label and otherwise. Other recent approaches follow the same recipe (Rieger et al., 2020; Selvaraju et al., 2019; Shao et al., 2021; Viviano et al., 2021); see (Lertvittayakumjorn and Toni, 2021) and (Friedrich et al., 2022) for an in-depth overview.

IAIA-BL. The only existing debugger for ProtoPNets, IAIA-BL (Barnett et al., 2021), also fits this template, in that it penalizes those part-prototypes that activate on pixels annotated as irrelevant. The IAIA-BL loss is defined as:

where is the ProtoPNet’s attribution map (see Section 2) and indicates the element-wise product. The first inner summation is analogous to the RRR loss, while the second one encourages the part-prototypes of other classes not to activate at all, thus reinforcing the separation loss in Section 2.

Limitations. A critical issue with these approaches is that they are restricted to pixel-level supervision, which is inherently local: an attribution map that distinguishes between object and background in a given image does not generally carry over to other images. This entails that, in order to see any benefits for more complex neural nets, a substantial number of examples must be annotated individually, as shown by our experiments (see Fig. 3). This is especially troublesome as ground-truth attribution maps are not cheap to acquire, partly explaining why most data sets in the wild do not come with such annotations.

Put together, these issues make debugging with these approaches cumbersome, especially in human-in-the-loop applications where, in every debugging round, the annotator has to supply a non-negligible number of pixel-level annotations.

4 Concept-level Debugging with ProtoPDebug

Our key observation is that in ProtoPNets confounds only influence the output if they are recognized by the part-prototypes, and that therefore they can be corrected for by leveraging concept-level supervision, which brings a number of benefits. We illustrate the basic intuition with an example:

Example: Consider the confounded bird classification task in Fig. 1 (right). Here, all training images of class Crested Auklet have been marked with a blue square “”, luring the network into relying heavily on this confound, as shown by the misclassified Laysian Albatro image. By inspecting the learned part-prototypes – together with source images that they match strongly, for context – a sufficiently expert annotator can easily indicate the ones that activate on confounds, thus naturally providing concept-level feedback.

Concept-level supervision sports several advantages over pixel-level annotations:

  1. [leftmargin=1.25em]

  2. It is cheap to collect from human annotators, using an appropriate UI, by showing part-prototype activations on selected images and acquiring click-based feedback.

  3. It is very informative, as it clearly distinguishes between content and context and generalizes across instances. The blue square, for instance, is a nuisance regardless of what image it appears in. From this perspective, one concept-level annotation is equivalent to several input-level annotations.

  4. By generalizing beyond individual images, it speeds up convergence and prevents relapse.

1:  initialize ,
2:  while True do
3:     for  do
4:        for each of the training examples most activated by  do
5:           if  appears confounded to user then
6:              add cut-out to
7:           else if  appears high-quality to user then
8:              add cut-out to
9:     if no confounds found then
10:        break
11:     fine-tune by minimizing
12:  return
Algorithm 1 A debugging session with ProtoPDebug. is a ProtoPNet trained on data set .

The debugging loop. Building on these observations, we develop ProtoPDebug, an effective and annotation efficient debugger for ProtoPNets that directly leverages concept-level supervision, which we describe next. The pseudo-code for ProtoPDebug is listed in Algorithm 1.

ProtoPDebug takes a ProtoPNet and its training set and runs for a variable number of debugging rounds. In each round, it iterates over all learned part-prototypes , and for each of them retrieves the training examples that it activates the most on, including its source example. It then asks the user to judge by inspecting its attribution map on each selected example. If the user indicates that a particular activation of looks confounded, ProtoPDebug extracts a “cut-out” of the confound from the source image , defined as the box (or boxes, in case of disconnected activation areas) containing 95% of the part-prototype activation. It then embeds using and adds it to a set of forbidden concepts . This set is organized into class-specific subsets , and the user can specify whether should be added to the forbidden concepts for a specific class or to all of them (class-agnostic confound). Conversely, if looks particularly high-quality to the user, ProtoPDebug embeds its cut-out and adds the result to a set of valid concepts . This is especially important when confounds are ubiquitous, making it hard for the ProtoPNet to identify (and remember) non-confounded prototypes (see COVID experiments in Section 5.2).

Once all part-prototypes have been inspected, is updated so as to steer it away from the forbidden concepts in while alleviating forgetting of the valid concepts in using the procedure described below. Then, another debugging round begins. The procedure terminates when no confound is identified by the user.

Fine-tuning the network. During fine-tuning, we search for updated parameters that are as close as possible to – thus retaining all useful information that has extracted from the data – while avoiding the bugs indicated by the annotator. This can be formalized as a constrained distillation problem (Gou et al., 2021):

where is an appropriate distance function between sets of parameters. Since the order of part-prototypes is irrelevant, we define it as a permutation-invariant Euclidean distance:

Here, simply reorders the part-prototypes of the buggy and updated models so as to maximize their alignment. Recall that we are interested in correcting the concepts, so we focus on the middle term. Notice that the logarithmic and exponential activation functions in Sections 2 and 2 are both inversely proportional to and achieve their maximum when the distance is zero, hence minimizing the distance is analogous to maximizing the activation. This motivates us to introduce two new penalty terms:

The hard constraint in Section 4 complicates optimization, so we replace it with a smoother penalty function, obtaining a relaxed formulation . The forgetting loss minimizes how much the part-prototypes of each class activate on the most activated concept to be forgotten for that class, written . Conversely, the remembering loss maximizes how much the part-prototypes activate on the least activated concept to be remembered for that class, denoted . The overall loss used by ProtoPDebug for fine-tuning the model is then a weighted combination of the ProtoPNet loss in Section 2 and the two new losses, namely , where and are hyper-parameters.

Benefits and limitations. The key feature of ProtoPDebug is that it leverages concept-level supervision, which sports improved generalization across instances, cutting annotation costs and facilitating interactive debugging. ProtoPDebug naturally accommodates concept-level feedback that applies to specific classes. E.g., although the concept of snow may be useful for some classes (say, winter), the user can mark it as irrelevant for others (like dog or wolf(Ribeiro et al., 2016). However, ProtoPDebug makes it easy to penalize part-prototypes of all classes for activating on class-agnostic confounds like, e.g., image artifacts. ProtoPDebug’s two losses bring additional, non-obvious benefits. The remembering loss helps to prevent catastrophic forgetting from occurring during sequential debugging sessions, while the forgetting loss prevents the model from re-learning the same confound in the future.

One source of concern is that, if the concepts acquired by the model are not understandable, debugging may become challenging. ProtoPNets already address this issue through the projection step, but following Koh et al. (2020), one could guide the network toward learning a set of desirable concepts by supplying additional concept-level supervision. More generally, like other explanation-based methods, ProtoPDebug exposes the knowledge acquired by the model to users. In sensitive prediction tasks, this poses the risk of leaking private information. Moreover, malicious annotators may supply adversarial supervision, corrupting the model. These issues can be avoided by restricting access to ProtoPDebug to trusted annotators only.

4.1 Relationship to other losses

The forgetting loss in Section 4 can be viewed as a lifted version of the separation loss in Section 2: the former runs over classes and leverages per-class concept feedback, while the latter runs over examples. The same analogy holds for the remembering and clustering losses, cf. Sections 2 and 4. The other difference is that the instance-level losses are proportional to the negative squared distance between part-prototypes and parts and the concept-level ones to the activation instead. However, existing activation functions are monotonically decreasing in the squared distance, meaning that they only really differ in fall-off speed, as can be seen in Fig. 2. The IAIA-BL loss can also be viewed as a (lower bound of the) separation loss. Indeed:


where is the “part-level” attribution mask for part , obtained by down-scaling the input-level attribution mask of all pixels in the receptive field of . This formulation highlights similarities and differences with our forgetting loss: the latter imposes similar constraints as the IAIA-BL penalty, but (i) it does not require full mask supervision, and (ii) it can easily accommodate concept-level feedback that targets entire sets of examples, as explained above.

5 Empirical Analysis

In this section, we report results showing how concept-level supervision enables ProtoPDebug to debug ProtoPNets better than the state-of-the-art debugger IAIA-BL and using less and cheaper supervision, and that it is effective in both synthetic and real-world debugging tasks. All experiments were implemented in Python 3 using Pytorch 

(Paszke et al., 2019) and run on a machine with two Quadro RTX 5000 GPUs. Each run requires in-between and minutes. The full experimental setup is published at and in the Supplementary Material together with additional implementation details. ProtoPDebug and IAIA-BL were implemented on top of ProtoPNets (Chen et al., 2019) using the Adam optimizer (Kingma and Ba, 2014). We used the source code of the respective authors of the two methods (39; 20).

5.1 Concept-level vs instance-level debugging

We first compare the effectiveness of concept-level and instance-level supervision in a controlled scenario where the role of the confounds is substantial and precisely defined at design stage. To this end, we modified the CUB200 data set (Wah et al., 2011), which has been extensively used for evaluating ProtoPNets (Chen et al., 2019; Hase and Bansal, 2020; Rymarczyk et al., 2021; Hoffmann et al., 2021; Nauta et al., 2021). This data set contains images of bird species in natural environments, with approximately train and test images per class. We selected the first five classes, and artificially injected simple confounds in the training images for three of them. The confound is a colored square of fixed size, and it is placed at at a random position. The color of the square is different for different classes, but it is absent from the test set, and thus acts as a perfect confound for these three classes. The resulting synthetic dataset is denoted CUB5.

Competitors. We compare ProtoPDebug, with supervision on the three square confounds, against the following competitors: 1) vanilla ProtoPNets, which are meant as a lower-bound that measures how much confounds affect the final predictive performance; 2) a vanilla ProtoPNet trained on a clean version of the dataset in which the confounds have been removed, denoted ProtoPNets; 3) IAIA-BL , the IAIA-BL model fit using ground-truth attribution maps on of the training examples, for , and a IAIA-BL , which only receives attributions masks on three examples (the same number of corrections given to ProtoPDebug). Note that IAIA-BL is not interactive, but supervision is made available at the beginning of training. In terms of ground-truth attribution maps for IAIA-BL , in terms of a single instance per confound for ProtoPDebug.

Implementation details. The embedding layers were implemented using a pre-trained VGG-11, allocating two prototypes for each class. The values of and were set to and , respectively, and to , as in the original paper (Barnett et al., 2021); experiments with different values of gave inconsistent results. In this setting, the corrective feedback fed to the forgetting loss is class-specific.

Figure 3: Comparison between ProtoPDebug (in red), ProtoPNets on confounded data (in black), ProtoPNets on unconfounded data (green), and IAIA-BL with varying amount of attribution-map supervision (shades of blue) on the CUB5 data set. Left to right: micro on the training set, cross-entropy loss on the training set, and on the test set. Bars indicate std. error over runs.

Results. Fig. 3 reports the experimental results on all methods averaged over 15 runs. The left and middle plots show the training set macro

and cross entropy, respectively. After 15 epochs, all methods manage to achieve close to perfect

and similar cross entropy. The right plot shows macro on the test set. As expected, the impact of the confounds is rather disruptive, as can been seen from the difference between ProtoPNets and ProtoPNets. Indeed, ProtoPNets end up learning confounds whenever present. Instance-level debugging seems unable to completely fix the problem. With the same amount of supervision as ProtoPDebug, IAIA-BL  does not manage to improve over the ProtoPNets baseline. Increasing the amount of supervision does improve its performance, but even with supervision on all examples IAIA-BL still fails to match the confound-free baseline ProtoPNets (IAIA-BL  curve). On the other hand, ProtoPDebug succeeds in avoiding to be fooled by confounds, reaching the performance of ProtoPNets despite only receiving a fraction of the supervision of the IAIA-BL  alternatives. Note however that ProtoPDebug does occasionally select a natural confound (the water), that is not listed in the set of forbidden ones. In the following section we show how to deal with unknown confounds via the full interactive process of ProtoPDebug.

5.2 ProtoPDebug in the real world

Cub5 data set.

Next, we evaluate ProtoPDebug in a real world setting in which confounds occur naturally in the data as, e.g., background patches of sky or sea, and emerge at different stages of training, possibly as a consequence of previous user corrections. To maximize the impact of the natural confounds, we selected the CUB200 classes with largest test F difference when training ProtoPNets only on birds (removing background) or on entire images (i.e., bird + background), and testing them on images where the background has been shuffled among classes. Then, we applied ProtoPDebug focusing on the five most confounded classes out of these 20. We call this dataset CUB5

. All architectures and hyperparameters are as before.

Results. Fig. 4 shows the part-prototypes progressively learned by ProtoPDebug. In the first iteration, when no corrective feedback has been provided yet, the model learns confounds for most classes: the branches for the second and the last classes, the sky for the third one. Upon receiving user feedback, ProtoPDebug does manage to avoid learning the same confounds again in most cases, but it needs to face novel confounds that also correlate with the class (e.g., the chain after fixing the tree branches in the last row). After two rounds, all prototypes are accepted, and the procedure ends. Table 1 reports the test set performance of ProtoPNet compared to ProtoPDebug in terms of and interpretability metric. The latter has been introduce in (Barnett et al., 2021, Eq. 6) to quantify the portion of pixel that are activate by a prototypes on a set of images. We slightly modify it to consider not only if the pixel activate but also the activation value (see Supplementary Material for the definition). Results show that ProtoPDebug improves over ProtoPNets in four out of five classes, and that it manages to learn substantially more interpretable prototypes (i.e., it is right for the right reason (Schramowski et al., 2020)), with a 20% improvement in activation precision on average.

1st Round 2nd Round 3nd Round
Figure 4: Three rounds of sequential debugging with ProtoPDebug on CUB5. Rows: part-prototypes (two per class) and user feedback (checkmark vs. cross). Note that the prototypes produced before the first correction (left-most column) correspond to those learned by plain ProtoPNets.
ProtoPNet ProtoPDebug
0.48 0.75 0.75 0.51 0.83 0.83
0.38 0.73 0.48 0.40 0.92 0.91
0.67 0.27 0.93 0.83 0.90 0.90
0.51 0.79 0.79 0.60 0.94 0.94
0.77 0.85 0.86 0.75 0.84 0.83
Avg. 0.56 0.68 0.62 0.88
Table 1: Test set performance of ProtoPNets and ProtoPDebug on CUB5. is the attribution precision of -th part-prototype. Bold denotes better performance.

COVID data set.

The potential of artificial intelligence and especially deep learning in medicine is huge 

EJ. (2019). On the other hand, the medical domain is known to be badly affected by the presence of confounds Jager et al. (2008); Smith and Nichols (2018). In order to evaluate the potential of ProtoPDebug to help addressing this issue, we tested it on a challenging real-world problem from the medical imaging domain. The task is to recognize COVID-19 from chest radiographies. As shown in (DeGrave et al., 2021), a classifier trained on this dataset heavily relies on confounds that correlate with the presence or absence of COVID. These confounds come from the image acquisition procedure or annotations on the image. We trained and tested ProtoPDebug on the same datasets used in (DeGrave et al., 2021) using the data pipeline source code (1). The embedding layers were implemented using a pre-trained VGG-19 (Simonyan and Zisserman, 2014). The values of , and are set to 200, 10 and 10 respectively. The values of and are decreased over the rounds to increase the effect of . The training set consists of a subset of images retrieved from the GitHub-COVID repository (Cohen et al., 2020) and from the ChestX-ray14 repository (Wang et al., 2017). The number of COVID-negative and COVID-positive radiographs is 7390 and 250, respectively. The test set is the union of PadChest (Bustos et al., 2020) and BIMCV-COVID19+ (Vayá et al., 2020) datasets, totalling 1147 negative and 597 positive images. In (DeGrave et al., 2021), the classifier is trained on 15 classes, i.e., COVID and other 14 pathologies. To simplify the identification of confounds, we focused on a binary classification task, discriminating COVID-positive images from images without any pathology.

Results. Fig. 9 reports the (non-zero activation) prototypes of ProtoPDebug at different correction rounds. As for Fig. 4, the left-most column corresponds to the prototypes learned by ProtoPNets. Note that for each prototype, the supervision is given to the 10 most-activated images (). Penalized confounds have been extracted from images on which the prototype has non-zero activation, because they influence the classification. However, the patches of the images to remember are extracted even if the activation is zero, in order to force the prototype to increase the activation on them thanks to the remembering loss. Eventually, ProtoPDebug manages to learn non-confounded prototypes, resulting in substantially improved test classification performance. The test goes from 0.26 of ProtoPNets (first column) to 0.54 at the end of the debugging process.

1st Round 2nd Round 3nd Round 4th Round
Figure 5: Four rounds of sequential debugging with ProtoPDebug on COVID. Only the prototypes with non-zero activation are reported: first prototype refers to COVID- and second COVID+.

6 Related Work

Input-level debuggers. ProtoPDebug is inspired by approaches for explanatory debugging (Kulesza et al., 2015) and explanatory interactive learning (Teso and Kersting, 2019; Schramowski et al., 2020), which inject explanations into interactive learning, enabling a human-in-the-loop to identify bugs in the model’s reasoning and fix them by supplying corrective feedback. They leverage input attributions (Teso and Kersting, 2019; Selvaraju et al., 2019; Lertvittayakumjorn et al., 2020; Schramowski et al., 2020), example attributions (Teso et al., 2021; Zylberajch et al., 2021), and rules (Popordanoska et al., 2020), but are not designed for concept-based models nor leverage concept-level explanations. IAIA-BL (Barnett et al., 2021) adapts these strategies to PPNets by penalizing part-prototypes that activate on irrelevant regions of the input, but it is restricted to instance-level feedback, which is neiter as cheap nor as effective as concept-level feedback, as shown by our experiments.

Concept-level debuggers. Stammer et al. (2021) debug neuro-symbolic models by enforcing logical constraints on the concepts attended to by a network’s slot attention module (Locatello et al., 2020), but requires the concept vocabulary to be given and fixed. ProtoPDebug has no such requirement, and the idea of injecting prior knowledge (orthogonal to our contributions) could be fruitfully integrated with it. ProSeNets (Ming et al., 2019) learn (full) prototypes in theembedding space given by a sequence encoder and enable users to steer the model by adding, removing and manipulating the learned prototypes. The update step however requires input-level supervision and adapts the embedding layers only. iProto-TREX (Schramowski et al., 2021) combines ProtoPNets with transformers, and learns part-prototypes capturing task-relevant text snippets akin to rationales (Zaidan et al., 2007). It supports dynamically removing and hot-swapping bad part-prototypes, but it lacks a forgetting loss and hence is prone to relapse. Applying ProtoPDebug to iProto-TREX is straightforward and would fix these issues. (Bontempelli et al., 2021) proposed a classification of debugging strategies for concept-based models, but presented no experimental evaluation. Lage and Doshi-Velez (2020) acquire concepts by eliciting concept-attribute dependency information with questions like “does the concept depression depend on feature lorazepam?”. This approach can prevent confounding, but it is restricted to white-box models and interaction with domain experts. FIND (Lertvittayakumjorn et al., 2020) offers similar functionality for deep NLP models, but it relies on disabling concepts only.

Other concept-based models. There exist several concept-based models (CBMs) besides ProtoPNets, including self-explainable neural networks (Alvarez-Melis and Jaakkola, 2018), concept bottleneck models (Koh et al., 2020; Losch et al., 2019), and concept whitening (Chen et al., 2020). Like ProtoPNets, these models stack an simulatable classifier on top of (non-prototypical) interpretable concepts. Closer to our setting, prototype classification networks (Li et al., 2018) and deep embedded prototype networks (Davoudi and Komeili, 2021) make predictions based on embedding-space prototypes of full training examples (rather than parts thereof). Our work analyzes ProtoPNets as they perform comparably to other CBMs while sporting improved interpretability (Chen et al., 2019; Hase and Bansal, 2020). Several extensions of ProtoPNets have also been proposed (Hase et al., 2019; Rymarczyk et al., 2021; Nauta et al., 2021; Kraft et al., 2021). ProtoPDebug naturally applies to these variants, and could be extended to other CBMs by adapting the source example selection and update steps.

This research has received funding from the European Union’s Horizon 2020 FET Proactive project “WeNet - The Internet of us”, grant agreement No. 823783, and from the “DELPhi - DiscovEring Life Patterns” project funded by the MIUR Progetti di Ricerca di Rilevante Interesse Nazionale (PRIN) 2017 – DD n. 1062 del 31.05.2019. The research of ST and AP was partially supported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215.


  • [1] (2021) AI for radiographic covid-19 detection selects shortcuts over signal. Github. External Links: Link Cited by: §5.2.
  • D. Alvarez-Melis and T. S. Jaakkola (2018) Towards robust interpretability with self-explaining neural networks. In NeurIPS, Cited by: §6.
  • D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K. Müller (2010) How to explain individual classification decisions. The Journal of Machine Learning Research 11, pp. 1803–1831. Cited by: §3.
  • A. J. Barnett, F. R. Schwartz, C. Tao, C. Chen, Y. Ren, J. Y. Lo, and C. Rudin (2021) A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nat. Mach. Intell.. Cited by: Appendix A, Appendix B, Appendix C, §1, §3, §5.1, §5.2, §6.
  • A. Bontempelli, F. Giunchiglia, A. Passerini, and S. Teso (2021) Toward a Unified Framework for Debugging Gray-box Models. In The AAAI-22 Workshop on Interactive Machine Learning, Cited by: §6.
  • A. Bustos, A. Pertusa, J. Salinas, and M. de la Iglesia-Vayá (2020) PadChest: a large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis 66, pp. 101797. External Links: ISSN 1361-8415, Document, Link Cited by: §5.2.
  • C. Chen, O. Li, D. Tao, A. Barnett, C. Rudin, and J. K. Su (2019) This looks like that: deep learning for interpretable image recognition. Advances in Neural Information Processing Systems 32, pp. 8930–8941. Cited by: 1st item, 2nd item, §C.1, Appendix C, §1, §2, §2, §5.1, §5, §6.
  • Z. Chen, Y. Bei, and C. Rudin (2020) Concept whitening for interpretable image recognition. Nature Machine Intelligence 2 (12), pp. 772–782. Cited by: §6.
  • J. P. Cohen, P. Morrison, and L. Dao (2020) COVID-19 image data collection. arXiv 2003.11597. External Links: Link Cited by: §5.2.
  • S. O. Davoudi and M. Komeili (2021) Toward faithful case-based reasoning through learning prototypes in a nearest neighbor-friendly space.. In International Conference on Learning Representations, Cited by: §6.
  • A. J. DeGrave, J. D. Janizek, and S. Lee (2021) AI for radiographic covid-19 detection selects shortcuts over signal. Nat. Mach. Intell.. Cited by: §C.2, Appendix C, §1, §1, §5.2.
  • A. Dombrowski, M. Alber, C. Anders, M. Ackermann, K. Müller, and P. Kessel (2019) Explanations can be manipulated and geometry is to blame. NeurIPS. Cited by: §1.
  • T. EJ. (2019) High-performance medicine: the convergence of human and artificial intelligence.. Nat Med. 25 (1), pp. 44–56. Cited by: §5.2.
  • F. Friedrich, W. Stammer, P. Schramowski, and K. Kersting (2022) A typology to explore and guide explanatory interactive machine learning. arXiv preprint arXiv:2203.03668. Cited by: §3.
  • R. Geirhos, J. Jacobsen, C. Michaelis, R. Zemel, W. Brendel, M. Bethge, and F. A. Wichmann (2020) Shortcut learning in deep neural networks. Nat. Mach. Intell.. Cited by: §1, §3.
  • J. Gou, B. Yu, S. J. Maybank, and D. Tao (2021) Knowledge distillation: a survey.

    International Journal of Computer Vision

    129 (6), pp. 1789–1819.
    Cited by: §4.
  • P. Hase and M. Bansal (2020) Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5540–5552. Cited by: §1, §2, §5.1, §6.
  • P. Hase, C. Chen, O. Li, and C. Rudin (2019) Interpretable image recognition with hierarchical prototypes. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7, pp. 32–40. Cited by: §6.
  • A. Hoffmann, C. Fanconi, R. Rade, and J. Kohler (2021) This looks like that… does it? shortcomings of latent space prototype interpretability in deep networks. arXiv preprint arXiv:2105.02968. Cited by: §5.1.
  • [20] (2021) IAIA-bl source code. Github. External Links: Link Cited by: §5.
  • K.J. Jager, C. Zoccali, A. MacLeod, and F.W. Dekker (2008) Confounding: what it is and how to deal with it. Kidney International 73 (3), pp. 256–260. Cited by: §5.2.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §5.
  • P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, and P. Liang (2020) Concept bottleneck models. In ICML, Cited by: §4, §6.
  • S. Kraft, K. Broelemann, A. Theissler, G. Kasneci, G. Esslingen am Neckar, S. H. AG, G. Wiesbaden, and G. Aalen (2021) SPARROW: semantically coherent prototypes for image classification. In The 32nd British Machine Vision Conference, Cited by: §6.
  • T. Kulesza, M. Burnett, W. Wong, and S. Stumpf (2015) Principles of explanatory debugging to personalize interactive machine learning. In IUI, Cited by: §1, §6.
  • I. Lage and F. Doshi-Velez (2020) Learning interpretable concept-based models with human feedback. arXiv preprint arXiv:2012.02898. Cited by: §6.
  • H. Lakkaraju and O. Bastani (2020) “How do I fool you?” Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 79–85. Cited by: §1.
  • S. Lapuschkin, S. Wäldchen, A. Binder, G. Montavon, W. Samek, and K. Müller (2019) Unmasking clever hans predictors and assessing what machines really learn. Nature communications. Cited by: §1, §3.
  • P. Lertvittayakumjorn, L. Specia, and F. Toni (2020) FIND: human-in-the-loop debugging deep text classifiers. In EMNLP, Cited by: §6, §6.
  • P. Lertvittayakumjorn and F. Toni (2021) Explanation-based human debugging of nlp models: a survey. arXiv preprint arXiv:2104.15135. Cited by: §3.
  • O. Li, H. Liu, C. Chen, and C. Rudin (2018) Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32. Cited by: §6.
  • F. Locatello, D. Weissenborn, T. Unterthiner, A. Mahendran, G. Heigold, J. Uszkoreit, A. Dosovitskiy, and T. Kipf (2020) Object-centric learning with slot attention. Advances in Neural Information Processing Systems 33, pp. 11525–11538. Cited by: §6.
  • M. Losch, M. Fritz, and B. Schiele (2019) Interpretability beyond classification output: Semantic Bottleneck Networks. arXiv preprint arXiv:1907.10882. Cited by: §6.
  • Y. Ming, P. Xu, H. Qu, and L. Ren (2019) Interpretable and steerable sequence learning via prototypes. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 903–913. Cited by: §6.
  • M. Nauta, R. van Bree, and C. Seifert (2021) Neural prototype trees for interpretable fine-grained image recognition. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    pp. 14933–14943. Cited by: §5.1, §6.
  • A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems 32. Cited by: §5.
  • G. Plumb, M. Al-Shedivat, Á. A. Cabrera, A. Perer, E. Xing, and A. Talwalkar (2020) Regularizing black-box models for improved interpretability. In NeurIPS, Cited by: §1.
  • T. Popordanoska, M. Kumar, and S. Teso (2020) Machine Guides, Human Supervises: Interactive Learning with Global Explanations. arXiv preprint arXiv:2009.09723. Cited by: §6.
  • [39] (2019) ProtoPNet source code. Github. External Links: Link Cited by: §5.
  • M. T. Ribeiro, S. Singh, and C. Guestrin (2016) “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1135–1144. Cited by: §4.
  • L. Rieger, C. Singh, W. Murdoch, and B. Yu (2020) Interpretations are useful: penalizing explanations to align neural networks with prior knowledge. In ICML, Cited by: §3.
  • A. S. Ross, M. C. Hughes, and F. Doshi-Velez (2017) Right for the right reasons: training differentiable models by constraining their explanations. In IJCAI, Cited by: §1, §3.
  • D. Rymarczyk, Ł. Struski, J. Tabor, and B. Zieliński (2021) ProtoPShare: Prototypical Parts Sharing for Similarity Discovery in Interpretable Image Classification. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 1420–1430. Cited by: §5.1, §6.
  • P. Schramowski, F. Friedrich, C. Tauchmann, and K. Kersting (2021)

    Interactively Generating Explanations for Transformer Language Models

    arXiv preprint arXiv:2110.02058. Cited by: §6.
  • P. Schramowski, W. Stammer, S. Teso, A. Brugger, F. Herbert, X. Shao, H. Luigs, A. Mahlein, and K. Kersting (2020) Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nat. Mach. Intell.. Cited by: §1, §5.2, §6.
  • R. R. Selvaraju, S. Lee, Y. Shen, H. Jin, S. Ghosh, L. Heck, D. Batra, and D. Parikh (2019) Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded. In ICCV, Cited by: §3, §6.
  • X. Shao, A. Skryagin, P. Schramowski, W. Stammer, and K. Kersting (2021) Right for Better Reasons: Training Differentiable Models by Constraining their Influence Function. In AAAI, Cited by: §3.
  • K. Simonyan, A. Vedaldi, and A. Zisserman (2014) Deep inside convolutional networks: visualising image classification models and saliency maps. In International Conference on Learning Representations, Cited by: §3.
  • K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §5.2.
  • L. Sixt, M. Granz, and T. Landgraf (2020) When explanations lie: why many modified bp attributions fail. In ICML, Cited by: §1.
  • S. M. Smith and T. E. Nichols (2018) Statistical challenges in “big data” human neuroimaging. Neuron 97 (2), pp. 263–268. Cited by: §5.2.
  • W. Stammer, P. Schramowski, and K. Kersting (2021) Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations. In CVPR, Cited by: §6.
  • S. Teso, A. Bontempelli, F. Giunchiglia, and A. Passerini (2021) Interactive Label Cleaning with Example-based Explanations. In NeurIPS, Cited by: §6.
  • S. Teso and K. Kersting (2019) Explanatory interactive machine learning. In AIES, Cited by: §1, §6.
  • S. Teso (2019)

    Toward faithful explanatory active learning with self-explainable neural nets

    In IAL Workshop, Cited by: §1.
  • M. d. l. I. Vayá, J. M. Saborit, J. A. Montell, A. Pertusa, A. Bustos, M. Cazorla, J. Galant, X. Barber, D. Orozco-Beltrán, F. García-García, M. Caparrós, G. González, and J. M. Salinas (2020) BIMCV covid-19+: a large annotated dataset of rx and ct images from covid-19 patients. External Links: Document, Link Cited by: §5.2.
  • J. D. Viviano, B. Simpson, F. Dutil, Y. Bengio, and J. P. Cohen (2021) Saliency is a possible red herring when diagnosing poor generalization. In ICLR, Cited by: §3.
  • C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie (2011) The Caltech-UCSD Birds-200-2011 Dataset. Cited by: §5.1.
  • X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. Summers (2017) ChestX-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 3462–3471. Cited by: §5.2.
  • K. Y. Xiao, L. Engstrom, A. Ilyas, and A. Madry (2020) Noise or signal: the role of image backgrounds in object recognition. In International Conference on Learning Representations, Cited by: §1.
  • O. Zaidan, J. Eisner, and C. Piatko (2007) Using “annotator rationales” to improve machine learning for text categorization. In NAACL, Cited by: §6.
  • H. Zylberajch, P. Lertvittayakumjorn, and F. Toni (2021) HILDIF: Interactive Debugging of NLI Models Using Influence Functions.

    Workshop on Interactive Learning for Natural Language Processing

    Cited by: §6.

Appendix A Activation Precision

In order to measure the quality of the explanations produced by the various models, we adapted the activation precision (AP) metric from the original IAIA-BL experiments [Barnett et al., 2021, Eq. 6]. Assuming that the attribution mask output by ProtoPNets is flattened to a vector with the same shape as , the original formulation of AP can be written as:


Here, is a threshold function that takes an attribution map (i.e., a real-valued matrix), computes the percentile of the attribution values, and sets the elements above this threshold to and the rest to . Intuitively, the AP counts how many pixels predicted as relevant by the network are actually relevant according to the ground-truth attribution mask .

This definition however ignores differences in activation across the most relevant pixels, which can be quite substantial, penalizing debugging strategies that manage to correctly shift most of the network’s explanations over the truly relevant regions. To fix this, we modified not to discard this information, as follows:


In our experiments, we set . The same value is used by IAIA-BL and ProtoPNets for visualizing the model’s explanations and is therefore what the user would see during interaction.

Appendix B Implementation details


We use the same ProtoPNet architecture for all competitors:

  • [leftmargin=1.25em]

  • Embedding step: For the CUB experiments, the embedding layers were taken from a pre-trained VGG-16, as in [Chen et al., 2019], and for COVID from a VGG-19.

  • Adaptation layers: As in [Chen et al., 2019], two

    convolutional layers follow the embedding layers, and use ReLu and sigmoid activation functions respectively. The output dimension is


  • Part-prototype stage: All models allocate exactly two part-prototypes per class, as allocating more did not bring any benefits. The prototypes are tensors of shape .

To stabilize the clustering and separation losses, we implemented two “robust” variants proposed by Barnett et al. [2021], which are obtained by modifying the regular losses in Eqs. 5 and 6 of the main text to consider the average distance to the closest part-prototypes, rather than the distance to the closest one only. These two changes improved the performance of all competitors.


We froze the embedding layer so as to drive the model toward updating the part-prototypes. Similarly to the ProtoPNet implementation, the learning rate of the prototype layer was scaled by every epochs. The training batch size is set to 20 for the experiments on CUB5 and COVID data sets, and 128 on CUB5.

We noticed that, in confounded tasks, projecting the part-prototypes onto the nearest latent training patch tends the prototypes closer to confounds, encouraging the model to rely on in it. This strongly biases the results against the ProtoPNet baseline. To avoid this problem, in our experiments we disabled the projection step.


The width of the activation function (Eq. 1 in the main paper) used in was set to in all experiments.

For CUB5 and CUB5, the weights of the different losses were set to , , . In this experiment, the remembering loss was not necessary and therefore it was disabled by setting . The value of was selected from so as to optimize the test set performance, averaged over three runs.

For the COVID data set, , , and were set to , , , and respectively for the first two debugging rounds. In the last round, , were decreased to of the above values and increased to , to boost the effect of and . This shifts the focus of training from avoiding very bad part-prototypes, to adjusting the partly wrong prototypes and increasing the activation on the relevant patches.

Appendix C Data statistics and preparation

Table 2 reports statistics for all data sets used in our experiments, including the number of training and test examples, training examples used for the visualization step, and classes. The visualization data set contains the original images, i.e., before the augmentation step, and is used to visualize the images on which the prototypes activate most during the debugging loop. All images were resized to , as done in ProtoPNet [Chen et al., 2019], IAIA-BL [Barnett et al., 2021] and for the COVID-19 data in [DeGrave et al., 2021].

Data set # Train # Test # Visualization # of Classes
Table 2: Statistics for all data sets used in our experiments.

c.1 CUB data sets

Classes in the CUB200 data set include only training examples each. To improve the performance of ProtoPNets, as done in [Chen et al., 2019]

, we augmented the training images using random rotations, skewing, and shearing, resulting in

augmented examples per class.

Figure 6: Example confound cut-outs on which relies for computing the activation value. Left to right: confounding green box present in CUB5 and two background patches from CUB5.


The five CUB200 classes that were make up the CUB5 data set are listed Fig. 7 (left). Synthetic confounds (i.e., colored boxes) have been added to training images of the first, second and last class. No confound was added to the other two classes. Fig. 6 (left) shows one of the three colored boxes. An example image confounded with a green box “” is shown in Fig. 7 (right) along with the corresponding ground-truth attribution mask used for IAIA-BL.

Class Label +Confound? Black-footed Albatross Laysan Albatross Sooty Albatross Groove-billed Ani Crested Auklet
Figure 7: Left: List of CUB200 classes selected for the CUB5 data set. Right: Confounded training images from CUB5 and corresponding ground-truth attribution mask.


We created a variation of CUB200, namely CUB5, to introduce a confound that occurs naturally, e.g., sky or trees patches in the background, and are learned during the training phase. We followed these steps to select the twenty classed reported in Table 3 and used for the experiments:

  1. We modified the test set images of all the 200 classes of CUB200 by placing the bird of one class on the background of another one. Since the size of the bird removed from the background may be bigger than the size of pasted bird, the uncovered pixel are set to random values. Fig. 8 shows the background change for one example image and the corresponding ground-truth mask. This background shift causes a drop in the performance of the model that relies on the background to classify a bird.

  2. We trained ProtoPNet on two variations of the training set: in the first, by removing the background, the images represents only the birds, and the second contains the entire images with both the bird and the background. We selected the twenty classes with the maximum gap in terms of on the modified test set, out of the 200 classes of CUB200. Thus, the background appears to be a confound for these twenty classes. The right most images in Fig. 6 highlight patches of the background learned by the model.

  3. We repeated the step above but only on the twenty classes, and we selected the five classes with the maximum gap. In the experiments, we provided supervision only on these five classes. Since the experiments are run on this subset of twenty classes, the separation loss might force the model to learn different prototypes with respect to the ones learned when trained on 200 classes. Hence, the prototype activations may shifted to other pixels. This additional ranking of the classes allows us to debug the most problematic ones and to show the potential performance improvement on them. For this reason, the results reported in the main paper are computed on these five classes.

Class Label Debug? Black-footed Albatross Groove-billed Ani Least Auklet Bronzed Cowbird Acadian Flycatcher Olive-sided Flycatcher Yellow-bellied Flycatcher Pine Grosbeak Green Violetear Dark-eyed Junco Class Label Debug? Horned Lark Pacific Loon Baird Sparrow Harris Sparrow Seaside Sparrow Cliff Swallow Blue-winged Warbler Northern Waterthrush Pileated Woodpecker Common Yellowthroat
Table 3: Twenty classes with largest train-test performance gap used in the our second experiment. The debug column indicates which classes received supervision and are thus debugged.
Figure 8: Left to right: original image with the sea in the background, the same bird but with the background of the land and the corresponding ground-truth attribution mask.

c.2 COVID data set

In [DeGrave et al., 2021], the classifier is trained on a multi-label data set, in which each scan is associated with multiple pathologies, and the evaluation is then performed on COVID positive vs. other pathologies. To simplify the identification and thus the supervision given to the machine, we generated a binary classification problem containing COVID-positive images and images without any pathology. In this setting, supervision is given only on obvious confounds like areas outside the body profile or on arms and elbows. A domain expert can provide detailed supervision on which parts of the lungs the machine must look at.

Appendix D Additional results

Additional part-prototypes for COVID.

Fig. 9 reports all prototypes for the experiment on the COVID data set.

Confusion matrices.

The confusion matrix on the left side of

Fig. 10 shows that ProtoPNet overpredicts the COVID-positive class, whereas the prediction of ProtoPDebug after three rounds is more balanced since the confounds have been debugged. A domain expert can give additional supervision on which parts of the lungs are relevant for the classification task, further improving the performance.

1st Round 2nd Round 3nd Round 4th Round
Figure 9: Four rounds of sequential debugging with ProtoPDebug on COVID. Top row reports the prototypes with non-zero activation: first prototype refers to COVID- and second COVID+. Second row: prototypes with zero activation for each class.
1st Round 4th Round
Figure 10: Confusion Matrices on test set where class 0 and 1 is COVID-negative and COVID-positive respectively.