"Why Should I Trust Interactive Learners?" Explaining Interactive Queries of Classifiers to Users

by   Stefano Teso, et al.
Technische Universität Darmstadt

Although interactive learning puts the user into the loop, the learner remains mostly a black box for the user. Understanding the reasons behind queries and predictions is important when assessing how the learner works and, in turn, trust. Consequently, we propose the novel framework of explanatory interactive learning: in each step, the learner explains its interactive query to the user, and she queries of any active classifier for visualizing explanations of the corresponding predictions. We demonstrate that this can boost the predictive and explanatory powers of and the trust into the learned model, using text (e.g. SVMs) and image classification (e.g. neural networks) experiments as well as a user study.


page 13

page 14


"Why Should I Trust You?": Explaining the Predictions of Any Classifier

Despite widespread adoption, machine learning models remain mostly black...

CAIPI in Practice: Towards Explainable Interactive Medical Image Classification

Would you trust physicians if they cannot explain their decisions to you...

Towards Black-box Iterative Machine Teaching

In this paper, we make an important step towards the black-box machine t...

Toward Machine-Guided, Human-Initiated Explanatory Interactive Learning

Recent work has demonstrated the promise of combining local explanations...

Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Many English-as-a-second language learners have trouble using near-synon...

Dice in the Black Box: User Experiences with an Inscrutable Algorithm

We demonstrate that users may be prone to place an inordinate amount of ...

SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers

We investigate a problem in which each member of a group of learners is ...

1 Introduction

Trust lies at the foundation of major theories of interpersonal relationships in psychology simpson2007psychological . Building expectations through interaction kramer2001trust ; mercier2011humans ; chang2010seeing and the ability of interpreting the other’s beliefs and intentions diyanni2012won are necessary for (justifiably) establishing, maintaining, and revoking trust. Hoffman et al. hoffman2013trust argue that interpersonal trust depends on the “perceived competence, benevolence (or malevolence), understandability, and directability—the degree to which the trustor can rapidly assert control or influence when something goes wrong,” and Chang et al. chang2010seeing advocate that trust is “dynamically updated based on experiences”. Recent work shows that trust into machines follows a similar pattern hoffman2013trust ; desai2013impact ; waytz2014mind ; wang2016trust , with some notable differences: it is often inappropriate to attribute benevolence/malevolence to machines, and trust into machines suffers from different biases than trust into individuals hoffman2013trust . These differences, however, do not affect the argument that interaction and understandability are central to trust in machine learners too. The competence of a classifier can be assessed by monitoring its behavior and beliefs over time, directability can be achieved by allowing the user to actively teach the model how to act and what to believe, while understandability can be approached by explaining the model and its predictions.

Surprisingly, the link between interacting, explaining and building trust has been largely ignored by the machine learning literature. On one hand, existing machine learning explainers focus on the batch learning setting only, and do not consider interaction between the user and the learner 

bucilua2006model ; ribeiro2016should ; lundberg2016unexpected

. Interactive learning frameworks such as active learning 

settles2012active , coactive learning shivaswamy2015coactive , and (to a lesser extent) preference elicitation pigozzi2016preferences do not consider the issue of trust either. In standard active learning, for instance, the model presents unlabelled instances to a user, and in exchange obtains their label. This interaction protocol is completely opaque: the user is oblivious to the model’s beliefs and reasons for predictions and to how they change in time, and cannot see the consequences of her own instructions. In coactive learning, the user can correct the system, if necessary, but the predictions are not explained to her. So, why should users trust models learned interactively?

To fill this gap, we propose the novel framework of explanatory interactive learning. Here the interaction takes the following form. In each step, the learner explains its interactive query to the user, and she responds by correcting the prediction and explanations, if necessary, to provide feedback. We also present a model-agnostic method, called caipi, instantiating our framework for active learning. caipi extends active learning in several ways. Akin to coactive learning shivaswamy2015coactive , query instances are accompanied by the the model’s corresponding predictions. This allows the user to check whether the model is right or wrong on the chosen instance. However, nothing prevents the model from being right (or wrong) for the wrong reasons, e.g., when there are ambiguities in the data such as confounders ross2017right . To avoid this issue, caipi accompanies predictions with corresponding explanations, computed by any local explainer of choice ribeiro2016should ; lundberg2016unexpected ; ross2017right ; ribeiro2018anchors ; in this paper, we advocate the use of lime111This also explains the name caipi, as caipirinhas are made out of limes. ribeiro2016should , a simple model-agnostic explainer that allows to easily compute explanations and present them to the user as interpretable (visual) artifacts. By witnessing the evolution of explanations—like a teacher supervising the progress of a student—the user can see whether the model eventually “gets it”. Explanations can also improve the quality of feedback by focusing the user’s attention to parts or aspects of the instance deemed important by the model cakmak2014eliciting . Finally, the user can even correct the explanation presented to guide the learner. This correction step is crucial for more directly affecting the learner’s beliefs and is integral to modulating trust hoffman2013trust ; kulesza2015principles . Explanation corrections also facilitate learning (the right concept), especially in problematic cases that labels alone can not handle ross2017right , as shown by our experiments. Overall, caipi is the first approach that employs explanation corrections as an additional feedback channel in a model- and explainer-agnostic fashion.

To summarize, our main contributions are: (1) explanatory interactive learning, an interactive learning framework aiming at encouraging (or discouraging, if appropriate) trust into the model; (2) a model- and explainer-agnostic implementation of the framework, called caipi  on top of active learning that makes use of the lime local explainer ribeiro2016should ; (3) a simple data augmentation strategy for learning from explanation corrections, and (4) an empirical analysis showing that interacting through explanations can modulate trust and improve learning effectiveness.

We proceed as follows. First, we touch upon additional related work. Then we introduce explanatory interactive learning and derive caipi. Before concluding, we present our empirical evaluation.

2 Explainable, interactive, and trustworthy machine learning

There are two classes of explainable machine learners: global approaches aim to explain a black-box model by converting it as a whole to a more interpretable format bucilua2006model ; bastani2017interpreting , while local approaches interpret individual predictions lundberg2016unexpected

. Surprisingly, they do not consider interaction between the user and the model. On the other hand, existing approaches to interactive learning do not consider the issue of explanations and trust. This is true for active learning, coactive learning, (active) imitation learning,

etc. In standard active learning, for instance, the model presents unlabelled instances to a user, and in exchange obtains their label. This interaction protocol is opaque: the user is oblivious to the model’s beliefs and to how they change in time, and can not see the consequences of her own instructions. Given the centrality of the user in recommendation, interactive preference elicitation approaches make use of conversational interaction to improve trust and directability peintner2008preferences ; chen2012critiquing , but often rely on rudimental learning strategies (if any). Indeed, learning from explanations has been explored in concept learning mitchell1986explanation ; dejong2011explanation

and probabilistc logic programming 

kimmig2007probabilistic , where explanations are themselves logical objects. Unfortunately, these results are tied to logic-based models and make use of rather opaque forms of explanations (e.g. logic proofs), which can be difficult to grasp for non-experts. Explanatory interactive learning instead leverages explanations for mainstream machine learning approaches. More recently, researchers explored feature supervision raghavan2006active ; raghavan2007interactive ; druck2008learning ; druck2009active ; settles2011closing ; attenberg2010unified and learning from rationales zaidan2007using ; zaidan2008modeling ; sharma2015active , which leverage both label- and feature-level (or sentence-level, for rationales) supervision for improved learning efficiency. These works show that providing rationales can be easy for human annotators zaidan2007using , sometimes even more so than providing the labels themselves raghavan2006active . The connection to directability and trust, however, is not explicitly made. Explanatory interactive learning generalizes these ideas to arbitrary classification tasks and models. Actually, the techniques proposed in these works are orthogonal to explanatory interactions and can be easily combined. Finally, indeed, the UI community also investigated meaningful interaction strategies so that the user can build a mental model of the system. In stumpf2009interacting the user is allowed to provide explanations, while kulesza2015principles provides an explanation-centric approach to interactive teaching. These works however focus on simple machine learning models, like Naïve Bayes, while explanatory interactive learning is much more general.

3 Explanatory Interactive Learning

In explanatory interactive learning, a learner is able to interactively query the user (or some other information source) to obtain the desired outputs at data points. The interaction takes the following form. At each step, the learner considers a data point (labeled or unlabeled), predicts a label, and provides explanations of its prediction. The user responds by correcting the learner if necessary, providing a slightly improved—but not necessarily optimal—feedback to the learner.

Let us now instantiate this schema to explanatory active learning—combining active learning with local explainers. Indeed, other interactive learning can be made explanatory too, including coactive learning  shivaswamy2015coactive , active imitation learning judah2012active , and mixed-initiative interactive learning cakmak2011mixed , but this is beyond the scope of this paper.

Active learning. The active learning paradigm targets scenarios where obtaining supervision has a non-negligible cost. Here we cover the basics of pool-based active learning, and refer the reader to two excellent surveys settles2012active ; hanneke2014theory for more details. Let be the space of instances and be the set of labels (e.g. ). Initially, the learner has access to a small set of labelled examples and a large pool of unlabelled instances . The learner is allowed to query the label of unlabelled instances (by paying a certain cost) to a user functioning as annotator, often a human expert. Once acquired, the labelled examples are added to and used to update the model. The overall goal is to maximize the model quality while keeping the number of queries or the total cost at a minimum. To this end, the query instances are chosen to be as informative as possible, typically by maximizing some informativeness criterion, such as the expected model improvement roy2001toward or practical approximations thereof. By carefully selecting the instances to be labelled, active learning can enjoy much better sample complexity than passive learning castro2006upper ; balcan2010true . Prototypical active learners include max-margin tong2001support and Bayesian approaches krause2007nonmyopic ; recently deep variants have been proposed gal2017deep .

However, active—showing query data points—and even coactive learning—showing additionally the prediction of the query data point— do not establish trust: informative selection strategies just pick instances where the model is uncertain and likely wrong. Thus, there is a trade-off between query informativeness and user “satisfaction,” as noticed and explored in schnabel2018short . In order to properly modulate trust into the model, we argue it is essential to present explanations.

Local explainers. There are two main strategies for interpreting machine learning models. Global approaches aim to explain the model by converting it as a whole to a more interpretable format (e.g. bucilua2006model ; bastani2017interpreting ). Local explainers—lime ribeiro2016should , rrr ross2017right , and anchors ribeiro2018anchors , among others— instead focus on the arguably more approachable task of explaining individual predictions lundberg2016unexpected . While explainable interactive learning can accommodate any local explainer, in our implementation we use lime, described next222Given a prediction, rrr extracts an explanation from the input gradients of the uninterpretable model. It was shown that lime and rrr tend to capture similar information ross2017right . The major difference is that rrr does not require sampling, but it is restricted to differentiable models. anchors instead extracts high-precision rules from the target prediction.. The idea of lime (Local Interpretable Model-agnostic Explanations) is simple: even though a classifier may rely on many uninterpretable features, its decision surface around any given instance can be locally approximated by a simple, interpretable local model. In lime, the local model is defined in terms of simple features encoding the presence or absence of basic components, such as words in a document or objects in a picture333While not all problems admit explanations in terms of elementary components, many of them do ribeiro2016should .. An explanation can be readily extracted from such a model by reading off the contributions of the various components to the target prediction and translating them to an interpretable (visual) artifact. For instance, in document classification it makes sense to highlight the words that support (or contradict) the target class.

More formally, let

be a classifier, for instance a Random Forest or a Neural Network,

the target prediction, and for each basic component let be the corresponding indicator function. In order to explain the prediction, lime produces an interpretable model , based solely on the interpretable features , that approximates in the neighborhood of . Here

can be any sufficiently interpretable model, for instance a sparse linear classifier or a shallow decision tree. Computing

amounts to solving , where is a “local loss” that measures the fidelity of to in the neighborhood of , and is a regularization term that controls the complexity and interpretability of .

For the sake of simplicity, here we focus on lime in conjunction with sparse linear models of the form , where is the inner product. In order to enhance interpretability, at most non-zero coefficients are allowed, where is sufficiently small (see Section 4 for the values we use). Specifically, lime measures the fidelity of the linear approximation with a “local” distance, namely . In order to solve the optimization problem above, the integral is first approximated by a sum over a large enough set of instances sampled uniformly at random444In lime the samples are taken from the image of , i.e., , and then mapped back to to compute their predicted class. We omit this detail for clarity.. Then, computing boils down to solving the sparsity-constrained least-squares problem: Note that does depend on both the target instance and on the prediction . The relevance and polarity of all components can be readily read off from the weights : suggests that the th component does contribute to the overall prediction, while and imply that, when present, the th component drives the prediction toward or away from it, respectively. Finally, this information is used to construct a (visual) explanation ribeiro2016should .

Active learning with Local Interpretable Model-agnostic Explanations. Now, we have everything together for explanatory active learning and caipi. Specifically, we require black-box access to an active learner and an explainer. We assume that the active learner provides a procedure for selecting an informative instance based on the current model , and a procedure for fitting a new model (or update the current model) on the examples in . The explainer is assumed to provide a procedure for explaining a particular prediction . The framework is intended to work for any reasonable learner and explainer. Here we employ lime (described above) for computing an interpretable model locally around the queries in order to visualize explanations for current predictions.

The pseudo-code of caipi is listed in Alg. 1 and follows the standard active learning loop. At each iteration an instance is chosen using the query selection strategy implemented by the SelectQuery procedure. Then its label is predicted using the current model , and Explain is used to produce an explanation of the prediction. The triple is presented to the user as a (visual) artifact. The user checks the prediction and the explanation for correctness, and provides the required feedback. Upon receiving the feedback, the system updates and accordingly and re-fits the model. The loop terminates when the iteration budget is reached or the model is good enough.

During interaction between the system and the user, three cases can occur:

6:     Present , , and to the user, obtain and explanation correction
10:until budget is exhausted or is good enough
Algorithm 1 The caipi algorithm takes as input: is the set of labelled examples, is the set of unlabelled instances, and is the iteration budget.

(1) Right for the right reasons: The prediction and the explanation are both correct. In this case, no feedback is requested. (2)Wrong for the wrong reasons: The prediction is wrong. As in active learning, we ask the user to provide the correct label. Indeed, the explanation is also necessarily wrong, but we currently do not require the user to act on it. (3) Right for the wrong reasons: The prediction is correct but the explanation is wrong. We ask the user to provide an explanation correction .

The “right for the wrong reasons” case is novel in active learning, and we propose explanation corrections to deal with it. They can assume different meanings depending on whether the focus is on component relevance, polarity, or relative importance (ranking), among others. In our experiments we ask the annotator to indicate the components that have been wrongly identified by the explanation as relevant, that is, . In document classification, would be the set of words that are irrelevant according to the user, but relevant for the model’s explanation.

Given the correction , we are faced with the problem of explaining it back to the learner. We propose a simple strategy to achieve this. This strategy is embodied by the ToCounterExamples procedure, which converts to a set of counterexamples. These aim at teaching the learner not to depend on the irrelevant components. In particular, for every we generate examples , where is an application-specific constant. Here, the labels are identical to the prediction . The instances , are also identical to the query , except that the th component (i.e. ) has been either randomized, changed to an alternative value, or substituted with the value of the th component appearing in other training examples of the same class. In sudoku, each would be a copy of the query sudoku where the cells in have been (for instance) filled with random numbers consistent with the predicted label. This process produces counterexamples555Instead of instantiating all possible counterexamples, it may be more efficient to only instantiate the ones that influence the current model, i.e., adversarial ones. This is an interesting avenue for future work., which are added to .

This data augmentation procedure is model-agnostic, but alternatives do exist, for instance contrastive examples zaidan2007using and feature ranking small2011constrained for SVMs and constraints on the input gradients for differentiable models ross2017right . These may be more effective in practice, and indeed caipi can accomodate all of them. However, since our strategy is both model- and explainer-agnostic, in the remainder we will stick to it for maximum generality.

Cognitive cost and confusion. Good explanations can effectively reduce the effort required to understand the query and facilitate answering it. Furthermore, by focusing the user’s attention to components relevant for the model, in the same spirit of teaching guidance cakmak2014eliciting , explanations can improve the quality of the obtained feedback. In some cases, however, explanations can be problematic. For instance, at the beginning of the learning process, the model is likely to underfit and its explanations can be arbitrarily bad. While these provide an opportunity for gathering informative corrections, they can also confuse the user by focusing her attention on irrelevant aspects of the query, but we note that the same holds when no explanation is shown at all, since the user may fail to notice the truly relevant components. The degree of this effect depends on how much the user relies on the explanation. In practice, it makes sense to only enable explanations after a number of burn-in iterations.

Mathematical intuition. To gain some understanding of caipi, let us consider the case of linear max-margin classifiers. Let be a linear classifier over two features, and , of which only the first is relevant. Fig. 1 (left) shows that (red line) uses to correcy classify a negative example . In order to obtain a better model (e.g. the green line), the simplest solution would be to enforce an orthogonality constraint during learning. Counterexamples follow the same principle. In the separable case, the counterexamples amount to additional max-margin constraints cortes1995support of the form . The only ones that influence the model are those on the margin, for which strict equality holds. For all pairs of such counterexamples it holds that , or equivalently , where . In other words, the counterexamples encurage orthogonality between

and the correction vectors

, thus approximating the orthogonality constraint above.

4 Empirical analysis

Our intention here is to address empirically the following questions: (RQ1) Can explanations (and their consistency over time) appropriately modulate the user’s trust into the model? (RQ2) Can explanation corrections lead to better models? (RQ3) Do the explanations necessarily improve as the learner obtains more labels? (RQ4) Does the magnitude of this effect depend on the specific learner?

(RQ1) User study. We designed a questionnaire about a machine that learns a simple concept by querying labels (but not

explanation corrections) to an annotator. The questionnaire was administered to 17 randomly selected undergraduate students from an introductory course on deep learning.

Figure 1: Left: mathematical intuition for the counterexample strategy. Right: example training rounds as presented in the questionnaire. The classification is correct but the explanation shows that the two most relevant pixels do not match the correct classification rule (as in S3). (Best viewed in color).
Q1 Q2 Q3
S1 64.7% 35.3% 82.4%
S2 76.5% 64.7% 70.6%
S3 29.4% 11.8% 41.2%
No corr. IG
Train 0.978 0.938 0.922 0.924 0.898
Test 0.482 0.821 0.851 0.858 0.853
Table 1: (Left-hand side table) User study: percentage of “yes” answers. (Right-hand side table) Accuracy on the fashion MNIST dataset of an MLP without corrections (left), with our counterexample corrections using varying (middle), and with input gradient constraints ross2017right (right).

We designed a toy binary classification problem (inspired by ross2017right ) about classifying small () black-and-white images. The subjects were told that an image is positive if the two top corners are white and negative otherwise. Then they were shown three learning sessions consisting of five query/feedback rounds each. In session 1 (S1) every round included the images chosen by the model, the corresponding prediction, and the label provided by a knowledgeable annotator. No explanations were shown. The predictions are wrong for the first three rounds and correct in the last two. Sessions 2 and 3 (S2, S3) were identical to S1, meaning that at every round the same example, prediction and feedback label were shown, but now explanations were also provided. The explanations highlighted the two most relevant pixels, as in Fig. 1 (right). In S2 the explanations converged to the correct rule—they highlight the two top corners—from the fourth round onwards, while in S3 they did not. Removing the explanations reduces both S2 and S3 to S1. After each session, the subjects were asked three questions: (Q1) “Do you believe that the AI system eventually learned to classify images correctly?” (Q2) “Do you believe that the AI system eventually learned the correct classification rule?” (Q3) “Would you like to further assess the AI system by checking whether it classifies 10 random images correctly?” The first two questions test the subject’s uncertainty in the predictive ability and beliefs of the classifier, respectively, while the last one tests the relationship between predictive accuracy (but not explanation correctness) and expected uncertainty reduction. The percentage of “yes” answers is reported in Tab. 1 (left).

As expected, the uncertainty in the model’s correctness depends heavily on what information channels are enabled. When no explanations are shown (S1), only 35% of the subjects assert to believe that the model learned the correct rule (Q2). This percentage almost doubles (65%) when explanations are shown and converge to the correct rule (S2). The need to see more examples also lowers from 82% to 71%, but does not drop to zero. This reflects the fact that five rounds are not enough to reduce the subject’s uncertainty to low enough levels. The percentage of subjects asserting that the classifier produces correct predictions (regardless of the learned rule, Q1) also increases from 65% to 77% when correct explanations are shown (S2). When the explanations do not converge (S3), the trend is reversed: Q1 drops to 29% and Q2 to 12%, that is, most subjects do not believe that the model’s behavior and beliefs are in any way correct. This is the only setting where Q3 drops below 50% (41%): witnessing that the model’s beliefs do not match the target rule induces distrust (with high certainty). This confirms the previous finding that trust into machines drops when wrong behavior is witnessed hoffman2013trust . We can therefore answer RQ1 affirmatively: augmenting interaction with explanations does appropriately drive trust into the model.

Next to the user study, we considered simulated users—as it is common for active learning— to investigate (RQ2–4). To this aim, we implemented caipi on top of several standard active learners and applied it to different learning tasks. Note that our goal here is to evaluate the contribution of explanation feedback, not the learners themselves. Indeed, caipi can trivially accommodate more advanced models than the ones employed here. In all cases, the model’s explanations are computed with lime 666Being based on sampling, lime can sometimes output different explanations for the same prediction. We substantially improve its stability by running it 10 times and keeping the components identified most often.. As is common in active learning, we simulate a human annotator that provides correct labels. Explanation corrections are also assumed to be correct and complete (i.e. they identify all false positive components), for simplicity777In practice corrections may be incomplete or noisy, especially when dealing with non-experts. This can be handled by, e.g., down-weighting the counterexamples.. The specifics of the correction strategy are described in the next paragraphs. Our experimental setup is available at: URL ANONYMIZED FOR REVIEWING.

(RQ2) Evaluation on a passive setting. We applied our data augmentation strategy to a decoy variant of fashion MNIST, a fashion product recognition dataset888From https://github.com/zalandoresearch/fashion-mnist. The dataset includes 70,000 images over 10 classes. All images were corrupted by introducing confounders, that is, patches of pixels in randomly chosen corners whose shade is a function of the label in the training set and random in the test set (see ross2017right

for details). The average test set accuracy of a multilayer perceptron (with the same hyperparameters as in 

ross2017right ) is reported in Tab. 1 (right) for three correction strategies: no corrections, our counterexample strategy (CE), and the input-gradient constraints proposed by ross2017right (IG). For CE, for every training image we added counterexamples where the decoy pixels are randomized. When no corrections are given, the accuracy on the test set is : the confounders completely fool the network. Providing even a single counterexample increases the accuracy to , i.e., the effect of confounders drops drastically. With more counterexamples the accuracy passes the one of IG (). This shows that (RQ2) counterexamples—and therefore explanation corrections—are an effective measure for improving the model in terms of both predictive performance and beliefs.

(RQ3,4) Actively choosing among concepts. We applied caipi to the “colors” dataset of ross2017right . The goal is to classify images with four possible colors. An image is positive if either the four corner pixels have the same color (rule 0) or the three top middle pixels have different colors (rule 1). Crucially, the dataset only includes images where either both rules hold or neither does, that is, labels alone can not disambiguate between the two rules. Explanations highlight the most relevant pixels, and corrections indicate the pixels that are wrongly identified as relevant. In the counterexamples, the wrongly identified pixels are recolored using all possible alternative colors consistent with 999In all experiments we always discard counterexamples that appear in the test set, for correctness.. The features are of the form “pixel has the same color as pixel ” for all ,

. In this space, the rules can be represented by sparse hyperplanes. We select each rule in turn and provide corrections according to it, and then check whether the feedback drives the classifier toward it.

was set to for rule 0 and to for rule 1. All measurements are 10-fold cross-validated.

In a first step, we considered a standard

SVM active learner with the closest-to-margin query selection heuristic 

settles2012active . This classifier can in principle represent both rules, but it is not suited for learning sparse concepts. Indeed, the SVM struggles to learn both rules, and the counterexamples have little effect on it (see the Appendix for the complete results). This is plausible since the norm cannot capture the underlying sparse concept: even though corrections try to drive the model toward it, the SVM can still learn both rules (as shown by the coefficient curves) without a problem. In other words, the model is not constrained enough.

Figure 2: SVM on the colors problem. Left: instantaneous score of the lime explanations for rule 0 (leftmost) and rule 1 (left middle). Right: decomposition of the learned weight vector when the corrections push toward rule 0 (right middle) and rule 1 (rightmost). (Best viewed in color)

An SVM, an active learner tailored for sparse concepts zhu20041norm , fares much better. Our results show that the rules greatly benefit this model. To evaluate their effect, we compute the average instantaneous score of the pixels identified by lime w.r.t. the pixels truly relevant for the selected rule. This measures the quality of the explanations presented to the user. In addition, we measure the objective quality of the model by decomposing the learned weights using least-squares as , where is the “perfect” weight vector of rule . The instantaneous and change in coefficients can be viewed in Fig. 2. Now that the model can capture the target concepts, the contribution of counterexamples is very noticeable: the SVM is biased toward rule 1, as it is sparser (data not shown), but it veers clearly toward rule 0 when corrections are provided and learns rule 1 faster when corrections push toward it. These results show clearly that explanation feedback can drive the classifier toward the right concept, so long as the chosen model can capture it clearly.

(RQ3,4) Active learning for document classification. Finally, we applied caipi to distinguishing between “Atheism” and “Christian” posts in the 20 newsgroups dataset 101010From: http://kdd.ics.uci.edu/databases/20newsgroups/20newsgroups.data.html

using logistic regression with uncertainty sampling. Headers and footers were removed; only

Figure 3: Logistic regression on 20 newsgroups. (Best viewed in color)

adjectives, adverbs, nouns, and verbs were kept and stemmed. As gold standard for the explanations, we selected

of the words as relevant using feature selection. Here the

lime-provided explanations identified the most relevant words, while corrections identified the falsely relevant words. For each document, was set to the number of truly relevant words. To showcase caipi’s flexibility, the counterexamples were generated with the strategy proposed in zaidan2007using , adapted to produce feedback based on the falsely relevant words only. The -fold cross-validated results can be found in Fig. 8. The plots show that the model with explanation corrections is steadily better in terms of explanation quality—over the test set (top) and queries (bottom)—than the baseline without corrections. The predictive performance can be found in the Appendix. These results highlight the potential benefits of explanatory interaction for the model’s quality.

5 Conclusion

In this paper, we argued that explaining queries is important in assessing trust into interactive machine learners. Within the resulting framework of explanatory interactive learning, we proposed caipi, a method that pairs model-agnostic explainers and active learners in a modular manner. Unlike traditional active learning approaches, caipi faithfully explains its queries in an interpretable manner and accounts for the user’s corrections of the model if it is right (wrong) for the wrong the reasons. This opens the black-box of active learning and turns it into a cooperative learning process between the machine and the user. The (boundedly rational) user is computationally limited in maximizing predictive power globally, while the machine is limited in dealing with ambiguities contained in a dataset. Our experimental results demonstrate that this cooperation can improve performance. Most importantly, a user study validated our key assumption, namely that explanations of interactive queries can indeed encourage (or discourages, if appropriate) trust into the model.

There are a number of interesting avenues for future work. Other interactive learning approaches such as coactive learning (structured prediction) shivaswamy2015coactive , active imitation learning judah2012active , and mixed-initiative interactive learning cakmak2011mixed should be made explanatory. In particular, explanatory variants of the recently proposed deep active learning approaches gal2017deep have the potential to further improve upon their sample complexity. Selecting queries that maximize the information of explanations, e.g., by using sp-lime ribeiro2016should , as well as feeding back informative counterexample only are likely to improve performance. If the base learner is differentiable, one may consider input gradient explanations, even multiple ones explaining queries for qualitatively different reasons, and then feeding back corrections via selectively penalizing their input gradients ross2017right . Generally, one should develop interaction protocols that reduce the cognitive load on the users.

6 Acknowledgments

The authors would like to thank Antonio Vergari, Samuel Kolb, Jessa Bekker, and Paolo Morettin for useful discussions. ST acknowledges the supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement No. [694980] “ SYNTH: Synthesising Inductive Data Models”. KK acknowledges the support by the German Science Foundation project “CAML: Argumentative Machine Learning” (KE1686/3-1) as part of the SPP 1999 (RATIO).


  • [1] Jeffry A Simpson. Psychological foundations of trust. Current directions in psychological science, 16(5):264–268, 2007.
  • [2] Roderick M Kramer and Peter J Carnevale. Trust and intergroup negotiation. Blackwell handbook of social psychology: Intergroup processes, pages 431–450, 2001.
  • [3] Hugo Mercier and Dan Sperber. Why do humans reason? arguments for an argumentative theory. Behavioral and brain sciences, 34(2):57–74, 2011.
  • [4] Luke J Chang, Bradley B Doll, Mascha van’t Wout, Michael J Frank, and Alan G Sanfey. Seeing is believing: Trustworthiness as a dynamic belief. Cognitive psychology, 61(2):87–105, 2010.
  • [5] Cara DiYanni, Deniela Nini, Whitney Rheel, and Alicia Livelli. ‘I won’t trust you if I think you’re trying to deceive me’: Relations between selective trust, theory of mind, and imitation in early childhood. Journal of Cognition and Development, 13(3):354–371, 2012.
  • [6] Robert R Hoffman, Matthew Johnson, Jeffrey M Bradshaw, and Al Underbrink. Trust in automation. IEEE Intelligent Systems, 28(1):84–88, 2013.
  • [7] Munjal Desai, Poornima Kaniarasu, Mikhail Medvedev, Aaron Steinfeld, and Holly Yanco. Impact of robot failures and feedback on real-time trust. In Proceedings of the 8th ACM/IEEE international conference on Human-robot interaction, pages 251–258. IEEE Press, 2013.
  • [8] Adam Waytz, Joy Heafner, and Nicholas Epley. The mind in the machine: Anthropomorphism increases trust in an autonomous vehicle. Journal of Experimental Social Psychology, 52:113–117, 5 2014.
  • [9] Ning Wang, David V Pynadath, and Susan G Hill. Trust calibration within a human-robot team: Comparing automatically generated explanations. In The Eleventh ACM/IEEE International Conference on Human Robot Interaction, pages 109–116. IEEE Press, 2016.
  • [10] Cristian Buciluǎ, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 535–541. ACM, 2006.
  • [11] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of ACM SIGKDD’16, pages 1135–1144, 2016.
  • [12] Scott Lundberg and Su-In Lee. An unexpected unity among methods for interpreting model predictions. arXiv preprint arXiv:1611.07478, 2016.
  • [13] Burr Settles. Active learning.

    Synthesis Lectures on Artificial Intelligence and Machine Learning

    , 6(1):1–114, 2012.
  • [14] Pannaga Shivaswamy and Thorsten Joachims. Coactive learning. J. Artif. Intell. Res.(JAIR), 53:1–40, 2015.
  • [15] Gabriella Pigozzi, Alexis Tsoukias, and Paolo Viappiani. Preferences in artificial intelligence. Annals of Mathematics and Artificial Intelligence, 77(3-4):361–401, 2016.
  • [16] Andrew Slavin Ross, Michael C Hughes, and Finale Doshi-Velez. Right for the right reasons: training differentiable models by constraining their explanations. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 2662–2670. AAAI Press, 2017.
  • [17] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. Anchors: High-precision model-agnostic explanations. 2018.
  • [18] Maya Cakmak and Andrea L Thomaz. Eliciting good teaching from humans for machine learners. Artificial Intelligence, 217:198–215, 2014.
  • [19] Todd Kulesza, Margaret Burnett, Weng-Keen Wong, and Simone Stumpf. Principles of explanatory debugging to personalize interactive machine learning. In Proceedings of the 20th International Conference on Intelligent User Interfaces, pages 126–137. ACM, 2015.
  • [20] Osbert Bastani, Carolyn Kim, and Hamsa Bastani. Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504, 2017.
  • [21] Bart Peintner, Paolo Viappiani, and Neil Yorke-Smith. Preferences in interactive systems: Technical challenges and case studies. AI Magazine, 29(4):13, 2008.
  • [22] Li Chen and Pearl Pu. Critiquing-based recommenders: survey and emerging trends. User Modeling and User-Adapted Interaction, 22(1-2):125–150, 2012.
  • [23] Tom M Mitchell, Richard M Keller, and Smadar T Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine learning, 1(1):47–80, 1986.
  • [24] Gerald DeJong and Shiau Hong Lim. Explanation-based learning. In Encyclopedia of Machine Learning, pages 388–392. Springer, 2011.
  • [25] Angelika Kimmig, Luc De Raedt, and Hannu Toivonen. Probabilistic explanation based learning. In European Conference on Machine Learning, pages 176–187. Springer, 2007.
  • [26] Hema Raghavan, Omid Madani, and Rosie Jones. Active learning with feedback on features and instances. Journal of Machine Learning Research, 7(Aug):1655–1686, 2006.
  • [27] Hema Raghavan and James Allan.

    An interactive algorithm for asking and incorporating feature feedback into support vector machines.

    In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 79–86. ACM, 2007.
  • [28] Gregory Druck, Gideon Mann, and Andrew McCallum. Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 595–602. ACM, 2008.
  • [29] Gregory Druck, Burr Settles, and Andrew McCallum. Active learning by labeling features. In

    Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1

    , pages 81–90. Association for Computational Linguistics, 2009.
  • [30] Burr Settles. Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances. In Proceedings of the conference on empirical methods in natural language processing, pages 1467–1478. Association for Computational Linguistics, 2011.
  • [31] Josh Attenberg, Prem Melville, and Foster Provost. A unified approach to active dual supervision for labeling features and examples. Machine Learning and Knowledge Discovery in Databases, pages 40–55, 2010.
  • [32] Omar Zaidan, Jason Eisner, and Christine Piatko. Using “annotator rationales” to improve machine learning for text categorization. In NAACL HLT 2007; Proceedings of the Main Conference, pages 260–267, April 2007.
  • [33] Omar F Zaidan and Jason Eisner. Modeling annotators: A generative approach to learning from annotator rationales. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 31–40. Association for Computational Linguistics, 2008.
  • [34] Manali Sharma, Di Zhuang, and Mustafa Bilgic. Active learning with rationales for text classification. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 441–451, 2015.
  • [35] Simone Stumpf, Vidya Rajaram, Lida Li, Weng-Keen Wong, Margaret Burnett, Thomas Dietterich, Erin Sullivan, and Jonathan Herlocker. Interacting meaningfully with machine learning systems: Three experiments. International Journal of Human-Computer Studies, 67(8):639–662, 2009.
  • [36] Kshitij Judah, Alan Fern, and Thomas G Dietterich. Active imitation learning via reduction to iid active learning. In UAI, pages 428–437, 2012.
  • [37] Maya Cakmak and Andrea L Thomaz. Mixed-initiative active learning. ICML 2011 Workshop on Combining Learning Strategies to Reduce Label Cost, 2011.
  • [38] Steve Hanneke et al. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2-3):131–309, 2014.
  • [39] Nicholas Roy and Andrew McCallum.

    Toward optimal active learning through monte carlo estimation of error reduction.

    ICML, Williamstown, pages 441–448, 2001.
  • [40] Rui M Castro and Robert D Nowak. Upper and lower error bounds for active learning. In The 44th Annual Allerton Conference on Communication, Control and Computing, volume 2, page 1, 2006.
  • [41] Maria-Florina Balcan, Steve Hanneke, and Jennifer Wortman Vaughan. The true sample complexity of active learning. Machine learning, 80(2):111–139, 2010.
  • [42] Simon Tong and Daphne Koller. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov):45–66, 2001.
  • [43] Andreas Krause and Carlos Guestrin. Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In Proceedings of the 24th international conference on Machine learning, pages 449–456. ACM, 2007.
  • [44] Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesian active learning with image data. In International Conference on Machine Learning, pages 1183–1192, 2017.
  • [45] Tobias Schnabel, Paul N Bennett, Susan T Dumais, and Thorsten Joachims. Short-term satisfaction and long-term coverage: Understanding how users tolerate algorithmic exploration. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 513–521. ACM, 2018.
  • [46] Kevin Small, Byron C Wallace, Carla E Brodley, and Thomas A Trikalinos. The constrained weight space svm: learning with ranked features. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 865–872. Omnipress, 2011.
  • [47] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995.
  • [48] Ji Zhu, Saharon Rosset, Robert Tibshirani, and Trevor J Hastie. 1-norm support vector machines. In Advances in neural information processing systems, pages 49–56, 2004.

Appendix A Results on the Colors Problem

Here we report the complete results of caipi on the colors problem. Figure 4 illustrates the behavior of caipi for an active SVM classifier when rule 0 is selected. Figure 5 does the same for rule 1. All results are

-fold cross-validated, the shaded areas represent the standard deviation. The four plots represent:

  • Top left: the predictive score measured on the test set. Here and below, the -axis represents iterations.

  • Top right: the coefficients of the “perfect” weight vectors of two rules w.r.t. from the learned weights.

  • Bottom left: the average instantaneous predictive score on the query instances.

  • Bottom right: the average instantaneous cumulative explanatory score on the query instances.

The results for the active classifier for rule 0 and 1 can be found in Figures 6 and 7, respectively.

Figure 4: SVM performance on the colors problem for rule 0. (Best viewed in color)
Figure 5: SVM performance on the colors problem for rule 1. (Best viewed in color)
Figure 6: SVM performance on the colors problem for rule 0. (Best viewed in color)
Figure 7: SVM performance on the colors problem for rule 1. (Best viewed in color)

Appendix B Results on the Newsgroups Dataset

In Figure 8 we report the complete results of caipi with an active logistic regression classifier applied to the “Christian” versus “Atheism” dataset. The plots are laid out as above, except the top right one, which in this case shows the explanatory performance measured on the test set (every 20 iterations).

Figure 8: Logistic regression on 20 newsgroups. (Best viewed in color)