Teaching Meaningful Explanations

by   Noel C. F. Codella, et al.

The adoption of machine learning in high-stakes applications such as healthcare and law has lagged in part because predictions are not accompanied by explanations comprehensible to the domain user, who often holds ultimate responsibility for decisions and outcomes. In this paper, we propose an approach to generate such explanations in which training data is augmented to include, in addition to features and labels, explanations elicited from domain users. A joint model is then learned to produce both labels and explanations from the input features. This simple idea ensures that explanations are tailored to the complexity expectations and domain knowledge of the consumer. Evaluation spans multiple modeling techniques on a simple game dataset, an image dataset, and a chemical odor dataset, showing that our approach is generalizable across domains and algorithms. Results demonstrate that meaningful explanations can be reliably taught to machine learning algorithms, and in some cases, improve modeling accuracy.


page 1

page 2

page 3

page 4


Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning

Using machine learning in high-stakes applications often requires predic...

Memory networks for consumer protection:unfairness exposed

Recent work has demonstrated how data-driven AI methods can leverage con...

TED: Teaching AI to Explain its Decisions

Artificial intelligence systems are being increasingly deployed due to t...

Consumer-Driven Explanations for Machine Learning Decisions: An Empirical Study of Robustness

Many proposed methods for explaining machine learning predictions are in...

Reliable Local Explanations for Machine Listening

One way to analyse the behaviour of machine learning models is through l...

Learning Explanations from Language Data

PatternAttribution is a recent method, introduced in the vision domain, ...

Global explainability in aligned image modalities

Deep learning (DL) models are very effective on many computer vision pro...

1 Introduction

New regulations call for automated decision making systems to provide “meaningful information” on the logic used to reach conclusions (Goodman and Flaxman, 2016; Wachter, Mittelstadt, and Floridi, 2017; Selbst and Powles, 2017). Selbst and Powles (2017) interpret the concept of “meaningful information” as information that should be understandable to the audience (potentially individuals who lack specific expertise), is actionable, and is flexible enough to support various technical approaches.

For the present discussion, we define an explanation as information provided in addition to an output that can be used to verify the output. In the ideal case, an explanation should enable a human user to independently determine whether the output is correct. The requirements of meaningful information have two implications for explanations:

  1. Complexity Match: The complexity of the explanation needs to match the complexity capability of the consumer (Kulesza et al., 2013; Dhurandhar et al., 2017). For example, an explanation in equation form may be appropriate for a statistician, but not for a nontechnical person (Miller, Howe, and Sonenberg, 2017).

  2. Domain Match: An explanation needs to be tailored to the domain, incorporating the relevant terms of the domain. For example, an explanation for a medical diagnosis needs to use terms relevant to the physician (or patient) who will be consuming the prediction.

In this paper, we take this guidance to heart by asking consumers themselves to provide explanations that are meaningful to them for their application along with feature/label pairs, where these provided explanations lucidly justify the labels for the specific inputs. We then use this augmented training set to learn models that predict explanations along with labels for new unseen samples.

The proposed paradigm is different from existing methods for local interpretation (Montavon, Samek, and Müller, 2017) in that it does not attempt to probe the reasoning process of a model. Instead, it seeks to replicate the reasoning process of a human domain user. The two paradigms share the objective to produce a reasoned explanation, but the model introspection approach is more well-suited to AI system builders who work with models directly, whereas the teaching explanations paradigm more directly addresses domain users. Indeed, the European Union GDPR guidelines say: “The controller should find simple ways to tell the data subject about the rationale behind, or the criteria relied on in reaching the decision without necessarily always attempting a complex explanation of the algorithms used or disclosure of the full algorithm.” More specifically, teaching explanations allows user verification and promotes trust. Verification is facilitated by the fact that the returned explanations are in a form familiar to the user. As predictions and explanations for novel inputs match with a user’s intuition, trust in the system will grow accordingly. Under the model introspection approach, while there are certainly cases where model and domain user reasoning match, this does not occur by design and they may diverge in other cases, potentially decreasing trust (Weller, 2017).

There are many possible instantiations for this proposed paradigm of teaching explanations. One is to simply expand the label space to be the Cartesian product of the original labels and the elicited explanations. Another approach is to bring together the labels and explanations in a multi-task setting. The third builds upon the tradition of similarity metrics, case-based reasoning and content-based retrieval.

Existing approaches that only have access to features and labels are unable to find meaningful similarities. However, with the advantage of having training features, labels, and explanations, we propose to learn feature embeddings guided by labels and explanations. This allows us to infer explanations for new data using nearest neighbor approaches. We present a new objective function to learn an embedding to optimize -nearest neighbor (NN) search for both prediction accuracy as well as holistic human relevancy to enforce that returned neighbors present meaningful information. The proposed embedding approach is easily portable to a diverse set of label and explanation spaces because it only requires a notion of similarity between examples in these spaces. Since any predicted explanation or label is obtained from a simple combination of training examples, complexity and domain match is achieved with no further effort. We also demonstrate the multi-task instantiation wherein labels and explanations are predicted together from features. In contrast to the embedding approach, we need to change the structure of the ML model for this method due to the modality and type of the label and explanation space.

We demonstrate the proposed paradigm using the three instantiations on a synthetic tic-tac-toe dataset (See supplement), and publicly-available image aesthetics dataset (Kong et al., 2016), olfactory pleasantness dataset (Keller et al., 2017), and melanoma classification dataset (Codella et al., 2018a). Teaching explanations, of course requires a training set that contains explanations. Since such datasets are not readily available, we use the attributes given with the aesthetics and pleasantness datasets in a unique way: as collections of meaningful explanations. For the melanoma classification dataset, we will use the groupings given by human users described in Codella et al. (2018b) as the explanations.

The main contributions of this work are:

  • A new approach for machine learning algorithms to provide meaningful explanations that match the complexity and domain of consumers by eliciting training explanations directly from them. We name this paradigm TED for ‘Teaching Explanations for Decisions.’

  • Evaluation of several candidate approaches, some of which learn joint embeddings so that the multidimensional topology of a model mimics both the supplied labels and explanations, which are then compared with single-task and multi-task regression/classification approaches.

  • Evaluation on disparate datasets with diverse label and explanation spaces demonstrating the efficacy of the paradigm.

2 Related Work

Prior work in providing explanations can be partitioned into several areas:

  1. Making existing or enhanced models interpretable, i.e. to provide a precise description of how the model determined its decision (e.g., Ribeiro, Singh, and Guestrin (2016); Montavon, Samek, and Müller (2017); Lundberg and Lee (2017)).

  2. Creating a second, simpler-to-understand model, such as a small number of logical expressions, that mostly matches the decisions of the deployed model (e.g., Bastani, Kim, and Bastani (2018); Caruana et al. (2015)).

  3. Leveraging “rationales”, “explanations”, “attributes”, or other “privileged information” in the training data to help improve the accuracy of the algorithms (e.g., (Sun and DeJong, 2005; Zaidan07using-annotator; Zaidan and Eisner, 2008; Zhang, Marshall, and Wallace, 2016; McDonnell et al., 2016; Donahue and Grauman, 2011; localizedattributes; Peng et al., 2016)

  4. Work in the natural language processing and computer vision domains that generate rationales/explanations derived from input text (e.g.,

    Lei, Barzilay, and Jaakkola (2016); Ainur, Choi, and Cardie (2010); Hendricks et al. (2016)).

  5. Content-based retrieval methods that provide explanations as evidence employed for a prediction, i.e. -nearest neighbor classification and regression (e.g., Wan et al. (2014); Jimenez-del-Toro et al. (2015); Li et al. (2018); Sun et al. (2012)).

The first two groups attempt to precisely describe how a machine learning decision was made, which is particularly relevant for AI system builders. This insight can be used to improve the AI system and may serve as the seeds for an explanation to a non-AI expert. However, work still remains to determine if these seeds are sufficient to satisfy the needs of a non-AI expert. In particular, when the underlying features are not human comprehensible, these approaches are inadequate for providing human consumable explanations.

The third group, like this work, leverages additional information (explanations) in the training data, but with different goals. The third group uses the explanations to create a more accurate model; we leverage the explanations to teach how to generate explanations for new predictions.

The fourth group seeks to generate textual explanations with predictions. For text classification, this involves selecting the minimal necessary content from a text body that is sufficient to trigger the classification. For computer vision (Hendricks et al., 2016), this involves utilizing textual captions to automatically generate new textual captions of images that are both descriptive as well as discriminative. While serving to enrich an understanding of the predictions, these systems do not necessarily facilitate an improved ability for a human user to understand system failures.

The fifth group creates explanations in the form of decision evidence: using some feature embedding to perform k-nearest neighbor search, using those k

neighbors to make a prediction, and demonstrating to the user the nearest neighbors and any relevant information regarding them. Although this approach is fairly straightforward and holds a great deal of promise, it has historically suffered from the issue of the semantic gap: distance metrics in the realm of the feature embeddings do not necessarily yield neighbors that are relevant for prediction. More recently, deep feature embeddings, optimized for generating predictions, have made significant advances in reducing the semantic gap. However, there still remains a “meaning gap” — although systems have gotten good at returning neighbors with the same label as a query, they do not necessarily return neighbors that agree with any

holistic human measures of similarity. As a result, users are not necessarily inclined to trust system predictions.

Doshi-Velez et al. (2017) discuss the societal, moral, and legal expectations of AI explanations, provide guidelines for the content of an explanation, and recommend that explanations of AI systems be held to a similar standard as humans. Our approach is compatible with their view. Biran and Cotton (2017) provide an excellent overview and taxonomy of explanations and justifications in machine learning.

Miller (2017) and Miller, Howe, and Sonenberg (2017) argue that explainable AI solutions need to meet the needs of the users, an area that has been well studied in philosophy, psychology, and cognitive science. They provides a brief survey of the most relevant work in these fields to the area of explainable AI. They, along with Doshi-Velez and Kim (2017), call for more rigor in this area.

3 Methods

The primary motivation of the TED paradigm is to provide meaningful explanations to consumers by leveraging the consumers’ knowledge of what will be meaningful to them. Section 3.1 formally describes the problem space that defines the TED approach. One simple learning approach to this problem is to expand the label space to be the Cartesian product of the original labels and the provided explanations. Although quite simple, this approach has a number of pragmatic advantages in that it is easy to incorporate, it can be used for any learning algorithm, it does not require any changes to the learning algorithm, and does not require owners to make available their algorithm. It also has the possibility of some indirect benefits because requiring explanations will improve auditability (all decisions will have explanations) and potentially reduce bias in the training set because inconsistencies in explanations may be discovered.

Other instantiations of the TED approach may leverage the explanations to improve model prediction and possibly explanation accuracy. Section 3.2 takes this approach to learn feature embeddings and explanation embeddings in a joint and aligned way to permit neighbor-based explanation prediction. It presents a new objective function to learn an embedding to optimize NN search for both prediction accuracy as well as holistic human relevancy to enforce that returned neighbors present meaningful information. We also discuss multi-task learning in the label and explanation space as another instantiation of the TED approach, that we will use for comparisons.

3.1 Problem Description

Let denote the input-output space, with

denoting the joint distribution over this space, where

. Then typically, in supervised learning one wants to estimate


In our setting, we have a triple that denotes the input space, output space, and explanation space, respectively. We then assume that we have a joint distribution over this space, where . In this setting we want to estimate . Thus, we not only want to predict the labels , but also the corresponding explanations for the specific and based on historical explanations given by human experts.

The space in most of these applications is quite different than and has similarities with in that it requires human judgment.

We provide methods to solve the above problem. Although these methods can be used even when is human-understandable, we envision the most impact for applications where this is not the case, such as the olfaction dataset described in Section 4.

3.2 Candidate Approaches

We propose several candidate implementation approaches to teach labels and explanations from the training data, and predict them for unseen test data. We will describe the baseline regression and embedding approaches. The particular parameters and specific instantiations are provided in Section 4.

Baseline for Predicting or

To set the baseline, we trained a regression (classification) network on the datasets to predict from using the mean-squared error (cross-entropy) loss. This cannot be used to infer for a novel . A similar learning approach was be used to predict from . If

is vector-valued, we used multi-task learning.

Multi-task Learning to Predict and Together

We trained a multi-task network to predict and together from

. Similar to the previous case, we used appropriate loss functions.

Embeddings to Predict and

We propose to use the activations from the last fully connected hidden layer of the network trained to predict or as embeddings for . Given a novel , we obtain its nearest neighbors in the embedding space from the training set, and use the corresponding and values to obtain predictions as weighted averages. The weights are determined using a Gaussian kernel on the distances in the embedding space of the novel to its neighbors in the training set. This procedure is used with all the NN-based prediction approaches.

Pairwise Loss for Improved Embeddings

Since our key instantiation is to predict and using the NN approach described above, we propose to improve upon the embeddings of from the regression network by explicitly ensuring that points with similar and values are mapped close to each other in the embedding space. For a pair of data points with inputs , labels , and explanations , we define the following pairwise loss functions for creating the embedding , where the shorthand for is for clarity below:


The cosine similarity

, where denotes the dot product between the two vector embeddings and denotes the norm. Eqn. (1) defines the embedding loss based on similarity in the space. If and are close, the cosine distance between and will be minimized. If and are far, the cosine similarity will be minimized (up to some margin ), thus maximizing the cosine distance. It is possible to set to create a clear buffer between neighbors and non-neighbors. The loss function (2) based on similarity in the space is exactly analogous. We combine the losses using and similarities as


where denotes the scalar weight on the loss. We set in our experiments. The neighborhood criteria on and in (1) and (2) are only valid if they are continuous valued. If they are categorical, we will adopt a different neighborhood criteria, whose specifics are discussed in the relevant experiment below.

4 Evaluation

To evaluate the ideas presented in this work, we focus on two fundamental questions:

  1. Does the TED approach provide useful explanations?

  2. How is the prediction accuracy impacted by incorporating explanations into the training?

Since the TED approach can be incorporated into many kinds of learning algorithms, tested against many datasets, and used in many different situations, a definitive answer to these questions is beyond the scope of this paper. Instead we try to address these two questions on four datasets, evaluating accuracy in the standard way.

Determining if any approach provides useful explanations is a challenge and no consensus metric has yet to emerge (Doshi-Velez et al., 2017). However, the TED approach has a unique advantage in dealing with this challenge. Specifically, since it requires explanations be provided for the target dataset (training and testing), one can evaluate the accuracy of a model’s explanation () in a similar way that one evaluates the accuracy of a predicted label (). We provide more details on the metrics used in Section 4.2. In general, we expect several metrics of explanation efficacy to emerge, including those involving the target explanation consumers (Dhurandhar et al., 2017).

4.1 Datasets

The TED approach requires a training set that contains explanations. Since such datasets are not readily available, we evaluate the approach on a synthetic dataset (tic-tac-toe, see supplement) and leverage 3 publicly available datasets in a unique way: AADB (Kong et al., 2016), Olfactory (Keller et al., 2017) and Melanoma detection (Codella et al., 2018a).

The AADB (Aesthetics and Attributes Database) (Kong et al., 2016) contains about images that have been human rated for aesthetic quality (), where higher values imply more aesthetically pleasing. It also comes with 11 attributes () that are closely related to image aesthetic judgments by professional photographers. The attribute values are averaged over 5 humans and lie in . The training, test, and validation partitions are provided by the authors and consist of 8,458, 1,000, and 500 images, respectively.

The Olfactory dataset (Keller et al., 2017) is a challenge dataset describing various scents (chemical bondings and labels). Each of the 476 rows represents a molecule with approximately chemoinformatic features () (angles between bonds, types of atoms, etc.). Similarly to AADB, each row also contains 21 human perceptions of the molecule, such as intensity, pleasantness, sour, musky, burnt. These are average values among 49 diverse individuals and lie in . We take to be the pleasantness perception and to be the remaining 19 perceptions except for intensity, since these 19 are known to be more fundamental semantic descriptors while pleasantness and intensity are holistic perceptions (Keller et al., 2017). We use the standard training, test, and validation sets provided by the challenge organizers with , , and instances respectively.

The 2017 International Skin Imaging Collaboration (ISIC) challenge on Skin Lesion Analysis Toward Melanoma Detection dataset (Codella et al., 2018a) is a public dataset with training and test images. Each image belongs to one of the three classes: melanoma (513 images), seborrheic keratosis (339 images) and benign nevus (1748 images). We use a version of this dataset described by Codella et al. (2018b), where the melanoma images were partitioned to 20 groups, the seborrheic keratosis images were divided into 12 groups, and 15 groups were created for benign nevus, by a non-expert human user. We show some example images from this dataset in Figure 1. We take the class labels to be and the total groups to be . In this dataset, each maps to a unique . We partition the original training set into a training set with images, and a validation set with images, for use in our experiments. We continue using the original test set with images.

Figure 1:

Example images from the ISIC Melanoma detection dataset. The visual similarity between Melanoma and non-Melanoma images is seen from the left and middle images. In the right image, the visually similar lesions are placed in the same group (i.e., have the same


4.2 Metrics

An open question that we do not attempt to resolve here is the precise form that explanations should take. It is important that they match the mental model of the explanation consumer. For example, one may expect explanations to be categorical (as in tic-tac-toe, loan approval reason codes, or our melanoma dataset) or discrete ordinal, as in human ratings. Explanations may also be continuous in crowd sourced environments, where the final rating is an (weighted) average over the human ratings. This is seen in the AADB and Olfactory datasets that we consider, where each explanation is averaged over 5 and 49 individuals respectively.

In the AADB and Olfactory datasets, since we use the existing continuous-valued attributes as explanations, we choose to treat them both as-is and discretized into bins, , representing negative, neutral, and positive values. The latter mimics human ratings (e.g., not pleasing, neutral, or pleasing). Specifically, we train on the original continuous values and report absolute error (MAE) between and a continuous-valued prediction . We also similarly discretize and as . We then report both absolute error in the discretized values (so that and ) as well as - error ( or ), where the latter corresponds to conventional classification accuracy. We use bin thresholds of and for AADB and and for Olfactory to partition the scores in the training data into thirds.

The explanations are treated similarly to by computing distances (sum of absolute differences over attributes) before and after discretizing to . We do not, however, compute the - error for . We use thresholds of and for AADB and and for Olfactory, which roughly partitions the values into thirds based on the training data.

For the melanoma classification dataset, since both and are categorical, we use classification accuracy as the performance metric for both and .

Performance on Y Performance on E
Algorithm or Class. Accuracy Discretized Continuous Discretized Continuous
Baseline () NA 0.4140 0.6250 0.1363 NA NA
Baseline () NA NA NA NA 0.5053 0.2042
100 0.4170 0.6300 0.1389 0.4501 0.1881
250 0.4480 0.5910 0.1315 0.4425 0.1861
Multi-task 500 0.4410 0.5950 0.1318 0.4431 0.1881
regression 1000 0.4730 0.5650 0.1277 0.4429 0.1903
(&) 2500 0.3190 0.6810 0.1477 0.4917 0.2110
5000 0.3180 0.6820 0.1484 0.5165 0.2119

1 0.3990 0.7650 0.1849 0.6237 0.2724
Embedding 2 0.4020 0.7110 0.1620 0.5453 0.2402
+ 5 0.3970 0.6610 0.1480 0.5015 0.2193
NN 10 0.3890 0.6440 0.1395 0.4890 0.2099
15 0.3910 0.6400 0.1375 0.4849 0.2069
20 0.3760 0.6480 0.1372 0.4831 0.2056

1 0.4970 0.5500 0.1275 0.6174 0.2626
Pairwise 2 0.4990 0.5460 0.1271 0.5410 0.2356
+ 5 0.5040 0.5370 0.1254 0.4948 0.2154
NN 10 0.5100 0.5310 0.1252 0.4820 0.2084
15 0.5060 0.5320 0.1248 0.4766 0.2053
20 0.5110 0.5290 0.1248 0.4740 0.2040

1 0.3510 0.8180 0.1900 0.6428 0.2802
Pairwise 2 0.3570 0.7550 0.1670 0.5656 0.2485
+ 5 0.3410 0.7140 0.1546 0.5182 0.2262
NN 10 0.3230 0.6920 0.1494 0.5012 0.2174
15 0.3240 0.6790 0.1489 0.4982 0.2150
20 0.3180 0.6820 0.1483 0.4997 0.2133

1 0.5120 0.5590 0.1408 0.6060 0.2617
Pairwise & 2 0.5060 0.5490 0.1333 0.5363 0.2364
+ 5 0.5110 0.5280 0.1272 0.4907 0.2169
NN 10 0.5260 0.5180 0.1246 0.4784 0.2091
15 0.5220 0.5220 0.1240 0.4760 0.2065
20 0.5210 0.5220 0.1235 0.4731 0.2050
(a) AADB dataset
Algorithm or K Y Accuracy E Accuracy
Baseline () NA 0.7045 NA
Baseline () NA 0.6628 0.4107

0.01 0.6711 0.2838
0.1 0.6644 0.2838
1 0.6544 0.4474
Multi-task 10 0.6778 0.4274
classification 25 0.7145 0.4324
( & ) 50 0.6694 0.4057
100 0.6761 0.4140
250 0.6711 0.3957
500 0.6327 0.3907

1 0.6962 0.2604
Embedding 2 0.6995 0.2604
+ 5 0.6978 0.2604
NN 10 0.6962 0.2604
15 0.6978 0.2604
20 0.6995 0.2604

1 0.6978 0.4357
Embedding 2 0.6861 0.4357
+ 5 0.6861 0.4357
NN 10 0.6745 0.4407
15 0.6828 0.4374
20 0.6661 0.4424

1 0.7162 0.1619
Pairwise 2 0.7179 0.1619
+ 5 0.7179 0.1619
NN 10 0.7162 0.1619
15 0.7162 0.1619
20 0.7162 0.1619

1 0.7245 0.3406
Pairwise 2 0.7279 0.3406
+ 5 0.7229 0.3389
NN 10 0.7279 0.3389
15 0.7329 0.3372
20 0.7312 0.3356

(b) ISIC Melanoma detection dataset
Table 1: Accuracy of predicting and using different methods (Section 3.2). Baselines for and are regression/classification networks, Multi-task learning predicts both and together, Embedding + NN uses the embedding from the last hidden layer of the baseline network that predicts . Pairwise + NN and Pairwise + NN use the cosine embedding loss in (1) and (2) respectively to optimize the embeddings of . Pairwise & + NN uses the sum of cosine embedding losses in (3) to optimize the embeddings of .
Performance on Y Performance on E
Algorithm Class. Accuracy Discretized Continuous Discretized Continuous

Baseline LASSO ()
NA 0.4928 0.5072 8.6483 NA NA
Baseline RF () NA 0.5217 0.4783 8.9447 NA NA

Multi-task regression (&)
NA 0.4493 0.5507 11.4651 0.5034 3.6536

Multi-task regression ( only)
NA NA NA NA 0.5124 3.3659

1 0.5362 0.5362 11.7542 0.5690 4.2050
Embedding 2 0.5362 0.4928 9.9780 0.4950 3.6555
+ 5 0.6087 0.4058 9.2840 0.4516 3.3488
NN 10 0.5652 0.4783 10.1398 0.4622 3.4128
15 0.5362 0.4928 10.4433 0.4798 3.4012
20 0.4783 0.5652 10.9867 0.4813 3.4746

1 0.6087 0.4783 10.9306 0.5515 4.3547
Pairwise 2 0.5362 0.5072 10.9274 0.5095 3.9330
+ 5 0.5507 0.4638 10.4720 0.4935 3.6824
NN 10 0.5072 0.5072 10.7297 0.4912 3.5969
15 0.5217 0.4928 10.6659 0.4889 3.6277
20 0.4638 0.5507 10.5957 0.4889 3.6576

1 0.6087 0.4493 11.4919 0.5728 4.2644
Pairwise 2 0.4928 0.5072 9.7964 0.5072 3.7131
+ 5 0.5507 0.4493 9.6680 0.4767 3.4489
NN 10 0.5507 0.4493 9.9089 0.4897 3.4294
15 0.4928 0.5072 10.1360 0.4844 3.4077
20 0.4928 0.5072 10.0589 0.4760 3.3877

1 0.6522 0.3913 10.4714 0.5431 4.0833
Pairwise & 2 0.5362 0.4783 10.0081 0.4882 3.6610
+ 5 0.5652 0.4638 10.0519 0.4622 3.4735
NN 10 0.5072 0.5217 10.3872 0.4653 3.4786
15 0.5072 0.5217 10.7218 0.4737 3.4955
20 0.4493 0.5797 10.8590 0.4790 3.5027

Table 2: Accuracy of predicting and for Olfactory using different methods. Baseline LASSO and RF predict from

. Multi-task LASSO regression with

regularization on the coefficient matrix predicts Y&E together, or just . Other methods are similar to those in Table 1

4.3 Aadb

We use all the approaches proposed in Section 3.2 to obtain results for the AADB dataset: (a) simple regression baselines for predicting and , (b) multi-task regression to predict and together, (c) NN using embeddings from the simple regression network (), (d) NN using embeddings optimized for pairwise loss using alone, and alone, and embeddings optimized using weighted pairwise loss with and .

All experiments with the AADB dataset used a modified PyTorch implementation of AlexNet for fine-tuning

(Krizhevsky, Sutskever, and Hinton, 2012)

. We simplified the fully connected layers for the regression variant of AlexNet to 1024-ReLU-Dropout-64-

, where for predicting , and for predicting . In the multi-task case for predicting and together, the convolutional layers were shared and two separate sets of fully connected layers with and outputs were used. The multi-task network used a weighted sum of regression losses for and : . All these single-task and multi-task networks were trained for epochs with a batch size of 64. The embedding layer that provides the dimensional output had a learning rate of , whereas all other layers had a learning rate of . For training the embeddings using pairwise losses, we used pairs chosen from the training data, and optimized the loss for epochs. The hyper-parameters were defined as . These parameters were chosen because they provided a consistently good performance in all metrics that we report for the validation set.

Table (a)a provides accuracy numbers for and using the proposed approaches. Numbers in bold are the best for a metric among an algorithm. Improvement in accuracy and MAE for over the baseline is observed for for Multi-task, Pairwise + NN and Pairwise & + NN approaches. Clearly, optimizing embeddings based on and sharing information between and is better for predicting . The higher improvement in performance using & similarities can be explained by the fact that can be predicted easily using in this dataset. Using a simple regression model, this predictive accuracy was with MAE of and for Discretized and Continuous, respectively. There is also a clear advantage in using embedding approaches compared to multi-task regression.

The accuracy of varies among the three NN techniques with slight improvements by using pairwise and then pairwise & . Multi-task regression performs better than embedding approaches in predicting for this dataset.

4.4 Melanoma

For this dataset, we use the same approaches we used for the AADB dataset with a few modifications. We also perform NN using embeddings from the baseline network, and we do not obtain embeddings using weighted pairwise loss with and because there is a one-to-one map from to in this dataset. The networks used are also similar to the ones used for AADB except that we use cross-entropy losses. The learning rates, training epochs, and number of training pairs were also the same as AADB. The hyper-parameters were set to , and were chosen to based on the validation set performance. For the loss (1), and were said to be neighbors if and non-neighbors otherwise. For the loss (2), and were said to be neighbors if and non-neighbors . The pairs where , but were not considered.

Table (b)b provides accuracy numbers for and using the proposed approaches. Numbers in bold are the best for a metric among an algorithm. The and accuracies for multi-task and NN approaches are better than that the baselines, which clearly indicates the value in sharing information between and . The best accuracy on is obtained using the Pairwise + NN approach, which is not surprising since contains and is more granular than . Pairwise + NN approach has a poor performance on since the information in is too coarse for predicting well.

4.5 Olfactory

Since random forest was the winning entry on this dataset

(Keller et al., 2017), we used a random forest regression to pre-select out of features for subsequent modeling. From these features, we created a base regression network using fully connected hidden layer of 64 units (embedding layer), which was then connected to an output layer. No non-linearities were employed, but the data was first transformed using

and then the features were standardized to zero mean and unit variance. Batch size was 338, and the network with pairwise loss was run for

epochs with a learning rate of . For this dataset, we set to . The parameters were chosen to maximize performance on the validation set.

Table 2 provides accuracy numbers in a similar format as Table (a)a. The results show, once again, improved accuracy over the baseline for Pairwise + NN and Pairwise & + NN and corresponding improvement for MAE for . Again, this performance improvement can be explained by the fact that the predictive accuracy of given using the both baselines were , with MAEs of and ( for RF) for Discretized and Continuous, respectively. Once again, the accuracy of varies among the 3

NN techniques with no clear advantages. The multi-task linear regression does not perform as well as the Pairwise loss based approaches that use non-linear networks.

5 Discussion

One potential concern with the TED approach is the additional labor required for adding explanations. However, researchers (Zaidan and Eisner, 2008; Zhang, Marshall, and Wallace, 2016; McDonnell et al., 2016) have quantified that the time to add labels and explanations is often the same as just adding labels for an expert SME. They also cite other benefits of adding explanations, such as improved quality and consistency of the resulting training data set.

Furthermore, in some instances, the NN instantiation of TED may require no extra labor. For example, in cases where embeddings are used as search criteria for evidence-based predictions of queries, end users will, on average, naturally interact with search results that are similar to the query in explanation space. This query-result interaction activity inherently provides similar and dissimilar pairs in the explanation space that can be used to refine an embedding initially optimized for the predictions alone. This reliance on relative distances in explanation space is also what distinguishes this method from multi-task learning objectives, since absolute labels in explanation space need not be defined.

6 Conclusion

The societal demand for “meaningful information” on automated decisions has sparked significant research in AI explanability. This paper suggests a new paradigm for providing explanations from machine learning algorithms. This new approach is particularly well-suited for explaining a machine learning prediction when all of its input features are inherently incomprehensible to humans, even to deep subject matter experts. The approach augments training data collection beyond features and labels to also include elicited explanations. Through this simple idea, we are not only able to provide useful explanations that would not have otherwise been possible, but we are able to tailor the explanations to the intended user population by eliciting training explanations from members of that group.

There are many possible instantiations for this proposed paradigm of teaching explanations. We have described a novel instantiation that learns feature embeddings using labels and explanation similarities in a joint and aligned way to permit neighbor-based explanation prediction. We present a new objective function to learn an embedding to optimize -nearest neighbor search for both prediction accuracy as well as holistic human relevancy to enforce that returned neighbors present meaningful information. We have demonstrated the proposed paradigm and two of its instantiations on a tic-tac-toe dataset (see Supplement) that we created, a publicly-available image aesthetics dataset (Kong et al., 2016), a publicly-available olfactory pleasantness dataset (Keller et al., 2017) and a publicly-available Melanoma detection dataset (Codella et al., 2018a). We hope this work will inspire other researchers to further enrich this paradigm.


  • Ainur, Choi, and Cardie (2010) Ainur, Y.; Choi, Y.; and Cardie, C. 2010. Automatically generating annotator rationales to improve sentiment classification. In Proceedings of the ACL 2010 Conference Short Papers, 336–341.
  • Bastani, Kim, and Bastani (2018) Bastani, O.; Kim, C.; and Bastani, H. 2018. Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504.
  • Biran and Cotton (2017) Biran, O., and Cotton, C. 2017. Explanation and justification in machine learning: A survey. In IJCAI-17 Workshop on Explainable AI (XAI).
  • Caruana et al. (2015) Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; and Elhadad, N. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 1721–1730.
  • Codella et al. (2018a) Codella, N. C.; Gutman, D.; Celebi, M. E.; Helba, B.; Marchetti, M. A.; Dusza, S. W.; Kalloo, A.; Liopyris, K.; Mishra, N.; Kittler, H.; et al. 2018a. Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic). In Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on, 168–172. IEEE.
  • Codella et al. (2018b) Codella, N. C.; Lin, C.-C.; Halpern, A.; Hind, M.; Feris, R.; and Smith, J. R. 2018b. Collaborative human-ai (chai): Evidence-based interpretable melanoma classification in dermoscopic images. arXiv preprint arXiv:1805.12234.
  • Dhurandhar et al. (2017) Dhurandhar, A.; Iyengar, V.; Luss, R.; and Shanmugam, K. 2017. A formal framework to characterize interpretability of procedures. In Proc. ICML Workshop Human Interp. Mach. Learn., 1–7.
  • Donahue and Grauman (2011) Donahue, J., and Grauman, K. 2011. Annotator rationales for visual recognition. In ICCV.
  • Doshi-Velez and Kim (2017) Doshi-Velez, F., and Kim, B. 2017. Towards a rigorous science of interpretable machine learning. In https://arxiv.org/abs/1702.08608v2.
  • Doshi-Velez et al. (2017) Doshi-Velez, F.; Kortz, M.; Budish, R.; Bavitz, C.; Gershman, S.; O’Brien, D.; Schieber, S.; Waldo, J.; Weinberger, D.; and Wood, A. 2017. Accountability of AI under the law: The role of explanation. CoRR abs/1711.01134.
  • Goodman and Flaxman (2016) Goodman, B., and Flaxman, S. 2016. EU regulations on algorithmic decision-making and a ‘right to explanation’. In Proc. ICML Workshop Human Interp. Mach. Learn., 26–30.
  • Hendricks et al. (2016) Hendricks, L. A.; Akata, Z.; Rohrbach, M.; Donahue, J.; Schiele, B.; and Darrell, T. 2016. Generating visual explanations. In European Conference on Computer Vision.
  • Jimenez-del-Toro et al. (2015) Jimenez-del-Toro, O.; Hanbury, A.; Langs, G.; Foncubierta–Rodriguez, A.; and Muller, H. 2015. Overview of the visceral retrieval benchmark 2015. In Multimodal Retrieval in the Medical Domain (MRMD) Workshop, in the 37th European Conference on Information Retrieval (ECIR).
  • Keller et al. (2017) Keller, A.; Gerkin, R. C.; Guan, Y.; Dhurandhar, A.; Turu, G.; Szalai, B.; Mainland, J. D.; Ihara, Y.; Yu, C. W.; Wolfinger, R.; Vens, C.; Schietgat, L.; De Grave, K.; Norel, R.; Stolovitzky, G.; Cecchi, G. A.; Vosshall, L. B.; and Meyer, P. 2017. Predicting human olfactory perception from chemical features of odor molecules. Science 355(6327):820–826.
  • Kong et al. (2016) Kong, S.; Shen, X.; Lin, Z.; Mech, R.; and Fowlkes, C. 2016. Photo aesthetics ranking network with attributes and content adaptation. In Proc. Eur. Conf. Comput. Vis., 662–679.
  • Krizhevsky, Sutskever, and Hinton (2012) Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. In Adv. Neur. Inf. Process. Syst. 25. 1097–1105.
  • Kulesza et al. (2013) Kulesza, T.; Stumpf, S.; Burnett, M.; Yang, S.; Kwan, I.; and Wong, W.-K. 2013. Too much, too little, or just right? Ways explanations impact end users’ mental models. In Proc. IEEE Symp. Vis. Lang. Human-Centric Comput., 3–10.
  • Lei, Barzilay, and Jaakkola (2016) Lei, T.; Barzilay, R.; and Jaakkola, T. 2016. Rationalizing neural predictions. In EMNLP.
  • Li et al. (2018) Li, Z.; Zhang, X.; Muller, H.; and Zhang, S. 2018. Large-scale retrieval for medical image analytics: A comprehensive review. In Medical Image Analysis, volume 43, 66–84.
  • Lundberg and Lee (2017) Lundberg, S., and Lee, S.-I. 2017. A unified approach to interpreting model predictions. In Advances of Neural Inf. Proc. Systems.
  • McDonnell et al. (2016) McDonnell, T.; Lease, M.; Kutlu, M.; and Elsayed, T. 2016. Why is that relevant? collecting annotator rationales for relevance judgments. In Proc. AAAI Conf. Human Comput. Crowdsourc.
  • Miller, Howe, and Sonenberg (2017) Miller, T.; Howe, P.; and Sonenberg, L. 2017. Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. In Proc. IJCAI Workshop Explainable Artif. Intell.
  • Miller (2017) Miller, T. 2017. Explanation in artificial intelligence: Insights from the social sciences. arXiv preprint arXiv:1706.07269.
  • Montavon, Samek, and Müller (2017) Montavon, G.; Samek, W.; and Müller, K.-R. 2017. Methods for interpreting and understanding deep neural networks. Digital Signal Processing.
  • Peng et al. (2016) Peng, P.; Tian, Y.; Xiang, T.; Wang, Y.; and Huang, T. 2016. Joint learning of semantic and latent attributes. In ECCV 2016, Lecture Notes in Computer Science, volume 9908.
  • Ribeiro, Singh, and Guestrin (2016) Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016.

    “Why should I trust you?”: Explaining the predictions of any classifier.

    In Proc. ACM SIGKDD Int. Conf. Knowl. Disc. Data Min., 1135–1144.
  • Selbst and Powles (2017) Selbst, A. D., and Powles, J. 2017. Meaningful information and the right to explanation. Int. Data Privacy Law 7(4):233–242.
  • Sun and DeJong (2005) Sun, Q., and DeJong, G. 2005. Explanation-augmented svm: an approach to incorporating domain knowledge into svm learning. In 22nd International Conference on Machine Learning.
  • Sun et al. (2012) Sun, J.; Wang, F.; Hu, J.; and Edabollahi, S. 2012. Supervised patient similarity measure of heterogeneous patient records. In SIGKDD Explorations.
  • Wachter, Mittelstadt, and Floridi (2017) Wachter, S.; Mittelstadt, B.; and Floridi, L. 2017. Why a right to explanation of automated decision-making does not exist in the general data protection regulation. Int. Data Privacy Law 7(2):76–99.
  • Wan et al. (2014) Wan, J.; Wang, D.; Hoi, S.; Wu, P.; Zhu, J.; Zhang, Y.; and Li, J. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM International Conference on Multimedia.
  • Weller (2017) Weller, A. 2017. Challenges for transparency. In Proc. ICML Workshop Human Interp. Mach. Learn. (WHI), 55–62.
  • Zaidan and Eisner (2008) Zaidan, O. F., and Eisner, J. 2008. Modeling annotators: A generative approach to learning from annotator rationales. In Proceedings of EMNLP 2008, 31–40.
  • Zhang, Marshall, and Wallace (2016) Zhang, Y.; Marshall, I. J.; and Wallace, B. C. 2016. Rationale-augmented convolutional neural networks for text classification. In Conference on Empirical Methods in Natural Language Processing (EMNLP).

Appendix A Synthetic Data Experiment

We provide a synthetic data experiment using the tic-tac-toe dataset. This dataset contains the 4,520 legal non-terminal positions in this classic game. Each position is labeled with a preferred next move () and an explanation of the preferred move (). Both and were generated by a simple set of rules given in Section A.1.

a.1 Tic-Tac-Toe

As an illustration of the proposed approach, we describe a simple domain, tic-tac-toe, where it is possible to automatically provide labels (the preferred move in a given board position) and explanations (the reason why the preferred move is best). A tic-tac-toe board is represented by two binary feature planes, indicating the presence of X and O, respectively. An additional binary feature indicates the side to move, resulting in a total of 19 binary input features. Each legal board position is labeled with a preferred move, along with the reason the move is preferred. The labeling is based on a simple set of rules that are executed in order (note that the rules do not guarantee optimal play):

  1. If a winning move is available, completing three in a row for the side to move, choose that move with reason Win

  2. If a blocking move is available, preventing the opponent from completing three in a row on their next turn, choose that move with reason Block

  3. If a threatening move is available, creating two in a row with an empty third square in the row, choose that move with reason Threat

  4. Otherwise, choose an empty square, preferring center over corners over middles, with reason Empty

Two versions of the dataset were created, one with only the preferred move (represented as a plane), the second with the preferred move and explanation (represented as a stack of planes). A simple neural network classifier was built on each of these datasets, with one hidden layer of 200 units using ReLU and a softmax over the 9 (or 36) outputs. On a test set containing 10% of the legal positions, this classifier obtained an accuracy of 96.53% on the move-only prediction task, and 96.31% on the move/explanation prediction task (Table 3). When trained on the move/explanation task, performance on predicting just the preferred move actually increases to 97.42%. This illustrates that the overall approach works well in a simple domain with a limited number of explanations. Furthermore, given the simplicity of the domain, it is possible to provide explanations that are both useful and accurate.

Input Y Accuracy Y and E Accuracy
Y 0.9653 NA
Y and E 0.9742 0.9631

Table 3: Accuracy of predicting Y, Y and E in tic-tac-toe