Toward Machine-Guided, Human-Initiated Explanatory Interactive Learning

07/20/2020 ∙ by Teodora Popordanoska, et al. ∙ Università di Trento 32

Recent work has demonstrated the promise of combining local explanations with active learning for understanding and supervising black-box models. Here we show that, under specific conditions, these algorithms may misrepresent the quality of the model being learned. The reason is that the machine illustrates its beliefs by predicting and explaining the labels of the query instances: if the machine is unaware of its own mistakes, it may end up choosing queries on which it performs artificially well. This biases the "narrative" presented by the machine to the user.We address this narrative bias by introducing explanatory guided learning, a novel interactive learning strategy in which: i) the supervisor is in charge of choosing the query instances, while ii) the machine uses global explanations to illustrate its overall behavior and to guide the supervisor toward choosing challenging, informative instances. This strategy retains the key advantages of explanatory interaction while avoiding narrative bias and compares favorably to active learning in terms of sample complexity. An initial empirical evaluation with a clustering-based prototype highlights the promise of our approach.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The increasing ubiquity and integration of sophisticated machine learning into our lives calls for strategies to justifiably establish or reject trust into models learned from data [25]. Explanatory interactive learning [34, 27] aims to achieve this by combining interactive learning, which enables users to build expectations through continued interaction, with computational explanations, which illustrate the model’s inner logic in an interpretable manner.

Compatibly with this observation, explanatory active learning (XAL) addresses classification tasks by combining active learning with local explanations [34]. During learning, the machine selects unlabeled instances (e.g., documents or images) and asks an annotator to label them. At the same time, the machine supplies predictions for the query instances and local explanations for these predictions to the annotator. The local explanations unpack the reasons behind the prediction in terms of, e.g., feature relevance [17]

. The supervisor can also correct the local explanations by highlighting, e.g., confounding features that the machine is wrongly relying on. A recent study on plant phenotyping data has shown that explanatory interaction helps human supervisors to acquire classifiers that are “right for the right reasons” 

[26] and to correctly assign trust into them [27].

Despite these results, in some situations the narrative presented by XAL may misrepresent the actual performance of the classifier. The issue is that, the narrative is focused on the query instances and the machine may fail to choose instances that capture its own flaws. This occurs, for instance, when the classifier is affected by unknown unknowns, i.e., (regions of) high-confidence mistakes [14, 3]. This leads to a form of narrative bias.

We tackle this issue by introducing explanatory guided learning (XGL), a novel form of human-initiated interactive learning that relies on global explanations [1, 17], which summarize the whole predictor using an interpretable surrogate, e.g., clusters or rules. Crucially, the supervisor, rather than the machine, is responsible for choosing the query instances. We argue that global explanations bring two key benefits. First, they convey less biased expectations of the predictor’s behavior, thus making it possible to avoid narrative bias. Second, they support human-initiated query selection by guiding the annotator towards discovering informative, problematic instances. This novel form of human-initiated, machine-guided interaction retains most benefits of explanatory interaction, including facilitating the acquisition of high-quality classifiers. We present an initial implementation of explanatory guided learning that uses clustering techniques to produce data-driven global explanations, and evaluate it empirically on a synthetic data set. Our initial results support the idea that explanatory guided learning helps supervisors to identify useful examples even in the presence of unknown unknowns and sub-optimal decision making.

Summarizing, we: 1) Identify the issue of narrative bias in explanatory active learning; 2) Introduce explanatory guided learning, which avoids narrative bias by combining human-initiated interactive learning with machine guidance in the form of global explanations; 3) Develop a prototype implementation and present an initial empirical evaluation on a synthetic data set.

2 Problem Statement

We are concerned with learning a classifier from examples . Here, is the space of inputs and

are the labels. The classifier is assumed to be black-box, e.g., a deep neural network or a kernel machine. Extra examples can be acquired by interacting with a human supervisor. Two requirements are put into place:

  1. Training data is initially scarce and obtaining more from the supervisor comes at a cost. Hence, a good classifier should be identified using few, well-chosen queries.

  2. The supervisor should be able to tell whether can be trusted as objectively as possible. The machine must supply information for this purpose.

The last requirement is not easy to formalize. Intuitively, it means that the machine should output performance statistics, predictions, explanations, proofs, plots, or any other kind of interpretable information necessary for the supervisor to establish whether is trustworthy. Clearly, providing persuasive information that misrepresents the quality of the model is in contrast with this requirement.

3 Preliminaries

The first requirement is satisfied by standard techniques like active learning (AL) [29]. To recap, in AL it is assumed that the machine has access to a large pool of unlabeled instances . During learning, the machine picks query instances from

, asks the supervisor to label them, and uses the feedback to update the classifier. The queries are chosen by maximizing their estimated informativeness, usually defined in terms of how uncertain the model is about their label and how well they capture the data distribution 

[30]. The AL interaction protocol however is completely opaque and thus fails the second requirement [34].

3:     select by maximizing informativeness w.r.t.
4:     present , prediction , and local explanation to the user
5:     receive ground-truth label and correction
6:     convert to examples (see [34])
7:     update and , retrain
8:until query budget exhausted or good enough
Algorithm 1 Pseudo-code of explanatory active learning [34, 27].

Explainable active learning (XAL) tackles this issue by supplying the user with information about the model being learned [34, 27]. The learning loop (listed in Algorithm 1) is similar to active learning, except that, after choosing a query point , the machine also predicts its label and explains the prediction using a local explanation . Local explanations are a building block of explainability [17]: they illustrate the logic behind individual predictions in terms of visual artifacts (e.g., saliency maps) that highlight which features are most responsible for the prediction. The query , prediction , and explanation are then supplied to the supervisor. Over time, this gives rise to a “narrative” that allows the supervisor to monitor the beliefs acquired by the machine and its improvement or lack thereof [34, 27]. The supervisor is allowed to provide a corrected local explanations by identifying, e.g., irrelevant features that appear as relevant in . The corrections are translated into examples [34] or gradient constraints [27] and used as additional supervision. This allows to directly teach the machine not to rely on, e.g., confounders.

Experiments with domain experts have shown that explanatory active learning enables users to identify bugs in the model and to steer it away from wrong concepts [27]. XAL has also shown potential for learning deep neural nets [35].

3.1 Narrative Bias

It was shown that narratives produced by XAL can work well in practice [34, 27]. The question is: do such narratives always help?

Figure 1: Left: synthetic data set. Middle: predictor with unknown unknowns. Right: example clustering-based global explanation; crosses are medioids.

The answer is no. Narratives focused on individual query instances may over-sell the predictor. Consider the data set in Figure 1, left. The red points belong to red clusters arranged on a regular grid, while the blue ones are distributed uniformly everywhere else. The decision surface of an SVM with a Gaussian kernel trained on examples is shown in the middle. Whereas the red clusters covered by the training set are recognized as such, the SVM is completely unaware of the other clusters. They are unknown unknowns [14]. What happens then is that AL sampling strategies would choose uncertain points around the known red clusters. At some point, the SVM would learn the known region and thus perform well on the query instances – in terms of both predictions and explanations. Therefore, the user might get the wrong impression that the model works well everywhere111Using other query strategies would not solve the problem (e.g., density-based strategies would fail as the data has no density lumps), cf. [2].. The unknown red clusters are however highly representative of the model’s performance and should not be ignored by the narrative.

This example shows also that unknown unknowns prevent the machine from choosing truly informative queries. Given that unknown unknowns occur often under class unbalance [2], sampling bias [3], and concept drift [16], both this and narrative bias are serious issues in practice.

4 Explanatory Guided Learning

In order to tackle narrative bias, we consider a very different setup. The idea is that, if the supervisor could see the whole decision surface of the predictor and were able to understand it, she could spot regions where the predictor misbehaves and select informative supervision from these regions. This form of human-initiated interactive learning [2] would be very strong against narrative bias. Of course, this setup is not realistic: the decision surface of most predictors is complex and hard to visualize, let alone validate and search instances with.

We propose to make this strategy practical using global explanations. While local explanations target individual predictions, global explanations illustrate the logic of the whole model [1, 17]. We restrict our attention to global explanations that summarize [9] the target classifier using an interpretable surrogate222Other kinds of global explanations, such as those based on feature dependencies or shape constraints [18, 33], are not considered..

Given a classifier , a global explanation is a classifier that approximates

and is taken from a suitable family of interpretable predictors, like (shallow) decision trees 

[12, 20, 8, 32, 6, 37] or (simple) rules [23, 19, 5, 4]. Usually, is by obtained by sampling a large enough set of instances and then solving , where

is some loss function. It is common to sample instances close to the data manifold, so to encourage the surrogate to mimic target predictor on the bulk of the distribution 

[9]. For simplicity, in our experiments we employ clustering-based explanations obtained by fitting clusters to the data, in which the label (known for and predicted for ) is treated as a feature, see Figure 1 (right) for an example.

A global explanation is presented as a visual or textual artifact [17]. In our case, a clustering-based explanation consists of a set of clusters, each associated to its predicted (majority) label and a textual description like “feature is larger than and feature is less than and …”. Thanks to their interpretability, global explanations are a natural device for helping supervisors to spot mistakes and also to select impactful examples, as it makes it possible for users to formulate counter-examples to, e.g., clearly wrong rules or clusters.

1:fit classifier on , compute global explanation
3:     supply to user and ask for an example
4:     receive from user and add it to
5:     update using
6:     update
7:until query budget exhausted or good enough
Algorithm 2 Explanatory guided learning. is the training set.

We call the combination of global explanations and human-initiated interactive learning explanatory guided learning. The pseudo-code is listed in Algorithm 2. The learning loop is straightforward. Initially a classifier is learned on the initial training set and a global explanation is computed (line 1). Then the interaction loop beings. In each iteration, the machine presents the global explanation to the supervisor and asks for a high-loss example (lines 34). This is discussed more in detail below. Upon receiving new supervision, the machine updates the training set, the predictor , and the global explanation (lines 46). The loop repeats until the classifier is deemed good enough or the labeling budget is exhausted.

4.1 Discussion

A key advantage of XGL is that it is – by design – immune to the form of narrative bias discussed above. A second key advantage is that it enables supervisors to provide examples tailored to the model at hand. This is critical in the presence of unknown unknowns and in other cases in which machine-initiated interactive learning fails [2]. Our preliminary experimental results are consistent with this observation. Notice that simply combining global explanations with machine-guided learning would not achieve the same effect, as the learning loop would entirely depend on possibly uninformative queries selected by the machine. Similarly, using a held-out validation set to monitor the model behavior would not capture the same information conveyed by global explanations. Another advantage is that global explanations offer support for protocols in which supervisors select entire batches of data rather than individual examples, as usually done in active learning of deep models [15].

Naturally, shifting the responsibility of choosing instances from the machine to the user may introduce other forms of bias. For instance, the explanation may be too rough an approximation of the target model or the supervisor may misinterpret the explanation. These two issues, however, are not exclusive to XGL: local explanations can be unfaithful [35] and annotator performance can be poor even in AL [38].

The main downsides of global explanations over local explanations are their added cognitive and computational costs. Despite this issue, we argue that global explanations are necessary to avoid narrative bias, especially in high-risk applications where the cost of deploying misbehaving models is significant. Moreover, the computational cost can be amortized over time by making use of incremental learning techniques. The cognitive cost can also be reduced and diluted over time, for instance by restricting the global explanations to regions that the user cares about. Another possibility is to employ a mixed-initiative schema that interleaves machine- and human-initiated interactive learning. This would make global explanations less frequent while keeping the benefits of XGL. The question becomes when to show the global explanations. One possibility is to program the machine to warn the user whenever the feedback has little impact on the model, indicating that either the query selection algorithm is “stuck” or that the supervisor’s understanding of the machine is misaligned.

We remark that our clustering-based implementation is meant as a prototype. More refined implementations would use global explanations based on trees or rules [17] and provide the user with interfaces and search tools to explore the space using the global explanation. For instance, the interface could build on the one designed for guided learning [2], a form of human-initiated interactive learning, by supplementing it with explanations.

5 Experiments

We study the following research questions:

  • Is XGL less susceptible to narrative bias than machine-initiated alternatives?

  • Is XGL competitive with active learning and guided learning in terms of sample complexity and model quality?

  • How does the annotator’s understanding of the global explanation affect the performance of XGL?

Experimental Setup: To answer these questions, we ran our clustering-based prototype on a synthetic classification task and compared it with several alternatives. The data set is illustrated in Figure 1 (left). The data consists of an unbalanced collection of blue and red ( points) bi-dimensional points, with a class ratio of about . The red points were sampled at random from 25 Gaussian clusters distributed on a five by five grid. The blue points were sampled uniformly from outside the red clusters with little or no overlap. All results are -fold cross-validated using stratification. For each fold, the training set initially includes five examples, at least two per class. The implementation and experiments can be found at

Human-machine interaction: The global explanation presented to the human supervisor acts as a summary of the model’s behavior on different regions of the problem space. In our experiments, the summary is constructed from clusters obtained from the data using -medoids. For the synthetic data set, we extract 10 clusters. In general, the number of clusters can either be determined in advance by the system designer, or it can be dynamically adapted based on the desired precision of the explanation. Each cluster is represented by a prototype - an exemplar case that serves to approximate the behavior of the model on that region as a whole. In addition to the prototypes, the regions can be further described, e.g., by utilizing interpretable decision trees [6, 12, 37] and extracting a rule list as a textual description. Upon inspection of the prototypes, their corresponding predicted labels and the description of the clusters, the human is expected to identify the ones with high loss and provide instances that correct the model’s beliefs. In other words, the search strategy performed by the user has a hierarchical structure: first she chooses a region where the model misbehaves and then she looks for an instance within that region based on some criterion.

User simulation: A helpful and cooperative teacher is simulated with a simple model that attempts to capture the various levels of knowledge and attention of real users. In the optimistic case, the simulated user is able to consistently detect a region of weakness for the learner. In the experiments, the cluster having the most wrongly classified points is regarded as the weakest area for the learner. In the worst case, the teacher selects an instance from a randomly chosen cluster. Within the chosen cluster, the simulated user selects a wrongly classified point closest to its prototype.

Baselines: We compare the performance of our method against the following baselines: 1) Guided learning: This strategy is simulated by class-conditional random sampling, i.e., the user interchangeably chooses instances from each class in a balanced proportion. 2) Active learning: Following the most popular AL strategy – uncertainty sampling – the instances are chosen based on the uncertainty of the classifier in their label. 3) Random sampling: The instances to be labeled are uniformly sampled from the unlabeled pool. This simple baseline is surprisingly hard to beat in practice. 4) Passive learning: The classifier is trained on the whole data. This baseline indicates how fast the methods will converge to the performance of the fully trained model.

5.1 Q1: Is XGL less susceptible to narrative bias?

The first experiment investigates the methods’ ability to handle unknown unknowns, i.e., red clusters that the model has not yet identified. Figure 2 shows a comparison of the instances selected with AL using uncertainty sampling (top row) and XGL (bottom row). The exploitative nature of uncertainty sampling leads the model to select instances around the presumed decision surface of the already found red clusters, thus wasting the querying budget on redundant instances. The narrative that XAL would create based on this choice of points is not representative of the generalization (in)ability of the model. In other words, there exist many regions in the data that are not explored, because this strategy becomes locked in a flawed hypothesis of where the decision boundary is and fails to explore the space.

This experiment showcases that explanatory strategies rooted in AL would misrepresent the true performance of the model in the presence of unknown unknowns. Therefore, the supervisor would be wrongly persuaded to trust it. In contrast, the explanatory component of XGL enables the user to understand the beliefs of the model being learned, while the human-initiated interaction allows the supervisor to appropriately act upon the observed flaws of the model. These preliminary results show that our prototype can be very helpful in detecting areas of wrong uncertainty and avoiding narrative bias.

(a) 10 iteration
(b) 70 iteration
(c) 140 iteration
(d) 10 iteration
(e) 70 iteration
(f) 140 iteration
Figure 2: Results for AL vs. XGL. First row: Instances chosen by the machine using uncertainty sampling misrepresent the model’s behavior in the presence of unknown unknowns. The model is oblivious to the existence of the clusters outside of the already found ones, i.e., the machine is unable to detect its own misbehavior. Second row: XGL combats narrative bias by injecting global explanations that enable the human supervisor to identify the flaws of the model and choose informative, non-redundant examples accordingly.

5.2 Q2: Is XGL competitive with active learning and guided learning in terms of sample complexity and model quality?

To address this question, we compare the F1 score versus the number of labeled examples, shown in Figure 3 (left). The performance is calculated using predictions on a separate test set. In every iteration, one instance is selected to be labeled using the strategies of interest. The model is retrained and the accuracy is reported.

To ensure that all methods have received the same amount of supervision, after the pool of wrongly classified points for XGL is exhausted, the simulation continues to sample random points from the unlabeled data. The iteration when the switch happens is depicted with the arrow on the plot. Notice that by that iteration, the model already achieves the same performance as the fully trained model.

In the initial stages of learning, the classifier is oblivious to the existence of red clusters outside of the assumed decision boundary around the labeled data points. In these conditions, the query selection with uncertainty sampling, as a representative for active learning, triggers a vicious cycle of selecting instances that add little information for the update of the classifier, which in turn leads to even more uninformative instances chosen in the next iteration because of the poor quality model. Consequently, in the given budget of queries, the model discovers only a fraction of the red clusters, resulting in a poor overall performance. Active learning and random sampling only rarely query instances from the red class, which is the reason for their slow progress shown in Figure 3 (left). On the other hand, using guided learning, in every iteration the model interchangeably observes instances from the two classes. The lack of sufficient blue points queried to refine the already found red clusters, leads the model to create a decision boundary as illustrated in Figure 4 (left), where many of the blue points will falsely be classified as red.

The decision surfaces of the classifier trained on the points selected by the strategies based on guided learning are shown in Figure 4. Comparing the chosen instances (circled in yellow) with each of these methods, it is evident that by using uninformed guided learning the supervisor is likely to present the learner instances from regions where it is already performing well. This observation, once again, emphasizes the importance of global explanations for enabling the user to provide non-redundant supervision, which ultimately results in more efficient learning in terms of sample complexity.

In summary, the overall trend is consistent with our intuition: using XGL we can train a significantly better classifier at low labeling costs, compared to the alternatives.

Figure 3: Left: F1 score on the synthetic data set with SVM (=100, =100). Our prototype surpasses the alternatives by a large margin. Right: The performance of XGL for simulated users with varying parameter in a softmax function over the number of misclassified points in every cluster.
Figure 4: Decision surfaces in the 90 iteration. Left: GL. No explanations are shown to the supervisor. Consequently, a lot of redundant points are selected (red points from already found red clusters). Right: XGL. The supervisor is presented with clustering-based global explanation. The chosen instances are balanced between refining the decision boundary and exploring new red clusters.

5.3 Q3: How does the annotator’s understanding of the global explanation affect the performance of XGL?

Needless to say, when the supervisor has a central role in the model’s learning process, understanding the explanation and taking proper actions becomes crucial. However, in realistic scenarios, human annotators can be imprecise and inconsistent in identifying regions with high loss. To account for these situations in our preliminary experiments, we simulate different users with a softmax function, parametrized by , over the number of misclassified points in every cluster. Let denote the number of mislabeled points in cluster

. The probability of the user choosing the cluster

is given by:


where is a parameter that serves to simulate the different users. Larger simulates a supervisor who identifies the weakest region for the classifier and chooses to label data points from it. In the worst case, the annotator does not understand the presented explanation and chooses a cluster at random, which is simulated with smaller . The results obtained for different values of are presented in Figure 3 (right). It can be observed that significant improvements can be gained for reasonable choices of clusters to select instances from, as simulated with and .

6 Related Work

The link between interactive learning, explainability, and trust is largely unexplored. Our work is rooted in explanatory interactive learning [34, 27] (see also [28] and [24]), of which XGL and XAL are instantiations.

There is little work on human-initiated interactive learning. XGL is an extension of guided learning [2, 31]

, in which search queries are used to combat label skew in classification tasks. We deepen these insights and show that global explanations combined with human-initiated interaction can be a powerful tool for handling unknown unknowns 

[14]. A major difference with our work is that guided learning is entirely black-box: the annotator is asked to provide examples of a target class but receives no guidance (besides a description of that class). Since the supervisor has no clue of what the model looks like, this makes it difficult for her to establish or reject trust and also to provide useful (e.g., non-redundant) examples. In contrast, XGL relies on global explanations to guide the user. Let us note that guided learning compares favorably to pure active learning in terms of sample complexity [7]. The idea of asking supervisors to identify machine mistakes has recently been explored in [3, 36], but the relationship to global explanations as machine guidance is ignored.

Our observations are consistent with recent work in interactive machine teaching. Machine teaching is the problem of selecting a minimal set of examples (a “teaching set”) able of conveying a target hypothesis to a learner [39]. The focus is primarily theoretical, so it is typically assumed that the teacher (who designs the teaching set) is a computational oracle with unbounded computational resources and complete access to the model and learning algorithm. It was recently proved that oblivious teachers unaware of the state of the model cannot perform better than random sampling [22, 11, 13]. In order to overcome this limitation, the teacher must interact with the machine, as we do. Existing algorithms, however, cannot be applied to human oracles or assume that the teacher can sample the whole decision surface of the learner, and in general ignore the issue of trust. Our work identifies global explanations as a practical solution to all of these issues.

Explanatory guided learning revolves around machine-provided guidance in the form of global explanations. This is related to, but should not be confused with, work on user-provided guidance [21] and teaching guidance [10]. These are orthogonal to our approach and could be fruitfully combined with it.

7 Conclusion and Outlook

The purpose of this paper is twofold. On the one hand, it draws attention to the issue of narrative bias, its root causes, and its consequences on trust in explanatory active learning. On the other, it shows how to deal with narrative bias by combining human-initiated interactive learning with machine guidance in the form of global explanations. An initial empirical analysis suggest that explanatory guided learning, our proposed method, helps the supervisor to select substantially less biased examples. Of course, a more thorough validation on real-world data sets and more refined forms of explanation, like rules or decision trees, is needed. This is left for future work.


  • [1] R. Andrews, J. Diederich, and A. B. Tickle (1995) Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-based systems 8 (6), pp. 373–389. Cited by: §1, §4.
  • [2] J. Attenberg and F. Provost (2010) Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 423–432. Cited by: §3.1, §4.1, §4.1, §4, §6, footnote 1.
  • [3] J. Attenberg, P. Ipeirotis, and F. Provost (2015) Beat the machine: challenging humans to find a predictive model’s “unknown unknowns”. Journal of Data and Information Quality (JDIQ) 6 (1), pp. 1–17. Cited by: §1, §3.1, §6.
  • [4] M. G. Augasta and T. Kathirvalavakumar (2012) Reverse engineering the neural networks for rule extraction in classification problems. Neural processing letters 35 (2), pp. 131–150. Cited by: §4.
  • [5] N. Barakat and A. P. Bradley (2010)

    Rule extraction from support vector machines: a review

    Neurocomputing 74 (1-3), pp. 178–190. Cited by: §4.
  • [6] O. Bastani, C. Kim, and H. Bastani (2017) Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504. Cited by: §4, §5.
  • [7] A. Beygelzimer, D. J. Hsu, J. Langford, and C. Zhang (2016) Search improves label for active learning. In Advances in Neural Information Processing Systems, pp. 3342–3350. Cited by: §6.
  • [8] O. Boz (2002) Extracting decision trees from trained neural networks. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 456–461. Cited by: §4.
  • [9] C. Buciluǎ, R. Caruana, and A. Niculescu-Mizil (2006) Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 535–541. Cited by: §4, §4.
  • [10] M. Cakmak and A. L. Thomaz (2014) Eliciting good teaching from humans for machine learners. Artificial Intelligence 217, pp. 198–215. Cited by: §6.
  • [11] Y. Chen, O. Mac Aodha, S. Su, P. Perona, and Y. Yue (2018) Near-optimal machine teaching via explanatory teaching sets. In International Conference on Artificial Intelligence and Statistics, pp. 1970–1978. Cited by: §6.
  • [12] M. Craven and J. W. Shavlik (1996) Extracting tree-structured representations of trained networks. In Advances in neural information processing systems, pp. 24–30. Cited by: §4, §5.
  • [13] S. Dasgupta, D. Hsu, S. Poulis, and X. Zhu (2019) Teaching a black-box learner. In International Conference on Machine Learning, pp. 1547–1555. Cited by: §6.
  • [14] T. G. Dietterich (2017) Steps toward robust artificial intelligence. AI Magazine 38 (3), pp. 3–24. Cited by: §1, §3.1, §6.
  • [15] Y. Gal, R. Islam, and Z. Ghahramani (2017) Deep bayesian active learning with image data. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1183–1192. Cited by: §4.1.
  • [16] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia (2014) A survey on concept drift adaptation. ACM computing surveys (CSUR) 46 (4), pp. 1–37. Cited by: §3.1.
  • [17] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, and D. Pedreschi (2018) A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51 (5), pp. 1–42. Cited by: §1, §1, §3, §4.1, §4, §4.
  • [18] A. Henelius, K. Puolamäki, H. Boström, L. Asker, and P. Papapetrou (2014) A peek into the black box: exploring classifiers by randomization. Data mining and knowledge discovery 28 (5-6), pp. 1503–1529. Cited by: footnote 2.
  • [19] U. Johansson, L. Niklasson, and R. König (2004) Accuracy vs. comprehensibility in data mining models. In Proceedings of the seventh international conference on information fusion, Vol. 1, pp. 295–300. Cited by: §4.
  • [20] R. Krishnan, G. Sivakumar, and P. Bhattacharya (1999) Extracting decision trees from trained neural networks. Pattern recognition 32 (12). Cited by: §4.
  • [21] G. Kunapuli, P. Odom, J. W. Shavlik, and S. Natarajan (2013) Guiding autonomous agents to better behaviors through human advice. In 2013 IEEE 13th International Conference on Data Mining, pp. 409–418. Cited by: §6.
  • [22] F. S. Melo, C. Guerra, and M. Lopes (2018) Interactive optimal teaching with unknown learners. In IJCAI, pp. 2567–2573. Cited by: §6.
  • [23] H. Núñez, C. Angulo, and A. Català (2002) Rule extraction from support vector machines.. In ESANN, pp. 107–112. Cited by: §4.
  • [24] R. Phillips, K. H. Chang, and S. A. Friedler (2018) Interpretable active learning. In Conference on Fairness, Accountability, and Transparency, Cited by: §6.
  • [25] I. Rahwan, M. Cebrian, N. Obradovich, J. Bongard, J. Bonnefon, C. Breazeal, J. W. Crandall, N. A. Christakis, I. D. Couzin, M. O. Jackson, et al. (2019) Machine behaviour. Nature 568 (7753), pp. 477–486. Cited by: §1.
  • [26] A. S. Ross, M. C. Hughes, and F. Doshi-Velez (2017) Right for the right reasons: training differentiable models by constraining their explanations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2662–2670. Cited by: §1.
  • [27] P. Schramowski, W. Stammer, S. Teso, A. Brugger, H. Luigs, A. Mahlein, and K. Kersting (2020) Right for the wrong scientific reasons: revising deep networks by interacting with their explanations. arXiv preprint arXiv:2001.05371. Cited by: §1, §1, §3.1, §3, §3, §6, Algorithm 1.
  • [28] S. Sen, P. Mardziel, A. Datta, and M. Fredrikson (2018-03) Supervising feature influence. arXiv e-prints. Note: arXiv:1803.10815 Cited by: §6.
  • [29] B. Settles (2012) Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6 (1), pp. 1–114. Cited by: §3.
  • [30] C. Shui, F. Zhou, C. Gagné, and B. Wang (2019) Deep active learning: unified and principled method for query and training. arXiv preprint arXiv:1911.09162. Cited by: §3.
  • [31] P. Simard, D. Chickering, A. Lakshmiratan, D. Charles, L. Bottou, C. G. J. Suarez, D. Grangier, S. Amershi, J. Verwey, and J. Suh (2014) Ice: enabling non-experts to build models interactively for large-scale lopsided problems. arXiv preprint arXiv:1409.4814. Cited by: §6.
  • [32] H. F. Tan, G. Hooker, and M. T. Wells (2016) Tree space prototypes: another look at making tree ensembles interpretable. arXiv preprint arXiv:1611.07115. Cited by: §4.
  • [33] S. Tan, R. Caruana, G. Hooker, P. Koch, and A. Gordo (2018) Learning global additive explanations for neural nets using model distillation. arXiv preprint arXiv:1801.08640. Cited by: footnote 2.
  • [34] S. Teso and K. Kersting (2019) Explanatory interactive machine learning. In AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, Cited by: §1, §1, §3.1, §3, §3, §6, 6, Algorithm 1.
  • [35] S. Teso (2019) Toward faithful explanatory active learning with self-explainable neural nets. In 3rd International Tutorial and Workshop on Interactive and Adaptive Learning, Cited by: §3, §4.1.
  • [36] C. Vandenhof and E. Law (2019) Contradict the machine: a hybrid approach to identifying unknown unknowns. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 2238–2240. Cited by: §6.
  • [37] C. Yang, A. Rangarajan, and S. Ranka (2018) Global model interpretation via recursive partitioning. In

    2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

    pp. 1563–1570. Cited by: §4, §5.
  • [38] M. Zeni, W. Zhang, E. Bignotti, A. Passerini, and F. Giunchiglia (2019) Fixing mislabeling by human annotators leveraging conflict resolution and prior knowledge. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3 (1), pp. 1–23. Cited by: §4.1.
  • [39] X. Zhu (2015) Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In Twenty-Ninth AAAI Conference on Artificial Intelligence, Cited by: §6.