There has been a recent surge of work in explanatory artificially intelligent (XAI) systems. One reason these systems are gaining traction is due to changes in policy, law, and regulation. Indeed, with the rise of AI-based decision making in areas of societal interest–from finance and employment to driving and journalism–policymakers see the need to discuss certain standards around XAI.
For example, the European Union’s General Data Protection Regulation (GDPR) creates obligations for automatic decision making processes , with a provision including right to explanation. This broad obligation puts the burden on those who process data with (and develop for) AI systems to generate reasonable explanations for their systems’ decision making processes. We can also observe broad societal concerns with issues of AI liability . For example, as autonomous vehicles are being introduced, we need a better understanding of what happened in the case of an accident (as it has already happened ). How can there be an appropriate investigation process if opaque decision-making algorithms are involved? How can we ensure that these machines are acting in our best interest?
AI algorithms and more general, complex AI systems cannot currently provide answers to these prior questions. These algorithms and systems are not built to explain to the general public nor policy-makers. Although there have been calls for work on creating systems and algorithms that can interpret [7, 27] and explain  some parts of their decisions, the current state-of-the-art explanatory systems are made for the programmer or expert, not an end user or policy-maker. The key difference here is that the current systems produce what we refer to as inside explanations. They point to a plausible technical explanation, either by looking at the relationships between the inputs and outputs of a model, or examining the role of individual parts, or by producing a surface-level explanation itself. Crucially, though, these explanations do not answer why questions. Continuing with the autonomous vehicle example, when an accident happens involving this autonomous machine, police officials, insurance companies, and the people who are harmed will want to know who or what is accountable for the accident and why it happened.
In this paper, we examine the types of questions that explanatory DNN algorithms can and cannot answer. We focus on DNNs specifically because of the recent shift in AI research from symbolic approaches to machine learning and deep learning111http://www.aiindex.org/2017-report.pdf, and because these are the systems are making safety-critical decisions in applications like autonomous driving and malware detection. In order to bridge the gap between the current, technical deep neural network explanations (which we refer to as inside explanations) and the explanations that answer why questions that would benefit society (which we refer to as outside explanations) we must develop explanations for DNNs that can answer these questions and be probed. We extend the work of a previously defined  taxonomy of explanations by looking at the specific questions each class can and cannot answer, and stress the necessity and technical challenges for these systems to be probed and answer why questions. We motivate future work on bridging the gap between current explainable methods by incorporating the types of questions and explanations society would like to know.
By doing this, we attempt to “bridge the gap” between current, technical deep neural network explanations and explanations that answer why questions beneficial to society. We approach this task in four main ways:
We differentiate between inside (technical) and outside (why) explanations.
We extend the work of a previously defined  taxonomy of explanations by looking at the specific questions each class can and cannot answer.
We discuss the necessity of probing of AI systems and the technical challenges inherent in creating outside explanations.
We motivate future work on bridging the gap between current explainable methods by incorporating the types of questions and explanations society would like to know.
2 Related Work
In this work, we focus on the types of questions and explanations that explanatory DNN methods can answer. Recent work has looked at ways to correct neural network judgments  and different ways to audit such networks by detecting biases . But these judgments are not enough to completely understand the model’s decisions-making. Other work answers why questions by finding similar data points . Although these methods are clearly interpretable, they do not provide any unique insights into why the model made those decisions. Other work examining best practices for explanation  provides a set of categories, but does not evaluate the questions that explanatory systems should be able to answer; which is necessary for policy makers and societal trust in DNN decision processes.
Since we are interested in the societal expectation of explanations, it is important to examine prior initiatives on the legal side. The desire for explanations in certain sectors is not new. For example, the U.S. Fair Credit Reporting Act creates obligations for transparency in certain financial decision-making processes, even if they are automated . The role of explanation has been examined to enforce accountability under the law . Similar recommendations in using explanations in law have been examined in promoting ethics for design , for privacy , and liability for machines .
We follow from previous work  that a proper explanation should be both interpretable and complete. By intepretable, we mean that the explanation should be understandable to humans. That does not necessarily imply that the explanation must be in human-readable form, in fact, visual cues are well-understood by humans. When we say that the methods must be complete, we mean the resulting explanation should be true to the model. For example, while using a simplified model that is explainable (like a linear model) to fit the input to the output results in a nice explanation, it is not a true and complete representation of the internal concepts, representations, and decisions of the model.
In this paper, we refer to inside and outside explanations for explaining DNNs. When we refer to inside explanations, we are referring to the type of explanations that currently exist, that are catered towards AI developers and experts. They are tailored to people inside the field. We encourage the development of outside explanations that are interpretable, complete, and answer why questions. They build trust not only to their technical developers, but also those outside the technical scope that may use their technology without a technical background.
4 Current Limitations
To show the strengths, benefits, and challenges of current explanatory approaches for opaque, DNN systems, we use a previously defined taxonomy  . The taxonomy consists of 3 classes. The first class are systems that explain processing by looking at the relationships between the inputs and the outputs. These include salience mapping [38, 29]39], automatic rule-extraction , and influence functions . The second class are systems that explain representation for DNNs either in terms of layers [26, 34]
or vectors. The final class is explanation-producing systems that look at attention-based visual question answers  or disentangled representations  to create self-explaining systems.
|Method||Questions it can answer||Questions it cannot answer|
|Processing||Why does this particular input lead to this particular output?||Why were these inputs most important to the output? How could the output be changed?|
|Representation||What information does the network contain?||Why is a representation relevant for the outputs? How was this representation learned?|
|Explanation||Given a particular output||What information contributed to this|
|producing||or decision, how can the network explain its behavior?||output/decision? How can the network yield a different output/decision?|
When examining this taxonomy for policy purposes, the biggest shortcoming is that these systems cannot explain why. There are two types of questions that we should ask of a decision making algorithm:
Why did this output happen?
How could this output have changed?
A summary of the types of questions that current DNN XAI systems can and cannot answer are in Table 1. Explanation producing systems nearly answer the first question we would want to ask a decision making algorithm: why did this output happen? But the problem is that their explanation may not be complete and true to the model’s internal decisions and processes. In order to illustrate the necessity of answering these questions, we proceed by walking-through examples of an AI algorithm, a larger AI system, and illustrate problems with data to motivate why explanatory systems should strive to answer the preceding questions.
4.1 Societal needs for explanations
Imagine you do not receive a loan, you would want to know what was the key attribute that limited the algorithm. You would want to know why you were denied a loan. But further, you may also want a sensitivity analysis: what would you need to change to be able to get the loan. There may be several possibilities. For example, you may have received a loan if you made $1,000 more per month; something you may be able to change in the future. However, other factors may be things you cannot control, such as the specific time you applied or your gender or ethnicity. So, in this case, we would like to have system that is able to explain why it decided to give or not a loan to each person.
Moreover, consider again the AI system example mentioned earlier of a self-driving car involved in an accident. The first thing we would want to know is why the accident happened. In this case there are many algorithms interacting. Finding if there was a faulty component is extremely challenging, making it even more relevant for each part of the system to be able to explain its decisions. In the recent Uber accident where the vehicle struck and killed a pedestrian, detecting the root-cause of the accident took several weeks to uncover in the complex AI software system .
But the other, more challenging question we would want to ask is if the accident could have been avoided. This is a more difficult question than the previous, single algorithm question. In complex systems, an error could be local (caused by a single failure), or it could be caused by an inconsistency between parts working together. The latter is much more difficult to detect, diagnosis, and explain.
In the Uber case, since the accident was deemed to be caused by a false positive on the error detection monitoring the pedestrian, several explanations could provide evidence of how this could have been avoided. Again, some inconsistencies are easier to fix than others (which may not be possible). Perhaps the sensitivity on the error detection monitor should be decreased or increased. Or perhaps the pedestrian would have been detected with higher certainty during the daytime, or if they were walking slower. It is still left to question whether the training data was at fault, which introduces a new set of questions.
4.2 The risk of opaque models
Generally explaining model behavior is not enough to build trust in these sorts of models. Another way these algorithms and systems can behave badly is due to a inconsistency in the training data and/or knowledge bases. This does not necessarily mean that the training data is “bad” per say, but that there is a misalignment between the expected data and the actual training data used. We have seen this recently with the Amazon recruiting algorithm . This algorithm was eventually disbanded because the results were extremely biased; since the algorithm had been trained on applicants data for the past 10 years (where males are dominant), it was teaching itself to choose male candidates. Even if the algorithm was modified, there was no way to ensure it was unbiased. Although this is an extremely compelling case for inquisitive explanatory systems, an even more persuasive case is for safety-critical tasks.
Equally important, consider a machine learning classifier to diagnose breast cancer from an image, where the training set was carefully selected to be fairly close to a 50-50 split of breast cancer and non-breast cancer scans. Even if the classifier is very accurate, without having access to complete explanations to understand how decisions are made in the model, it is not certain that it is making decisions for the right reasons—the model may in fact, learn a feature it should not rely on despite predicting breast cancer very accurately. In, one classifier learned the resolution of the scanner camera, therefore predicting cancerous images from a high resolution very accurately. Figuring out this sort of data problem is extremely difficult. It requires either an attuned intuition of the model’s inner workings or the model to be able to answer questions to do a fine-grain sensitivity analysis.
5 Discussion and Conclusion
As humans, we start to build trust by asking questions. We should be able to judge the behavior of opaque DNN algorithms by asking similar questions as we would ask of a person’s behavior in similar circumstances. The key idea here is that explainability exceeds transparency and interpretability, to empower the public to understand the decisions and underlying mechanisms.
We have focused mainly on explainable DNN algorithms, but when a DNN algorithm is a part of a larger system, explainability is not enough. Explainability does not necessary imply that complex systems are accountable and responsible; this may have to be tackled with other requirements. For example, a system can provide an “outside” explanation without addressing who or what is responsible and why. At the same time, an explainable system may be transparent without being receptive to human feedback. In future work, we will examine how explainability may interact with other parts of systems (including the human operator or user) to produce systems that can be augmented and learn from feedback.
But in order to truly trust AI systems, people will not only need to feel confident that they understand certain how decisions are made, but also that they have recourse. If a person disagrees with a system’s output, they should be empowered and able to change the system. Designing explainable AI is important, but only when opaque systems are auditable, explainable, answer questions, and interpret feedback will we be confident enough to trust their decision-making.
-  Robert Andrews, Joachim Diederich, and Alan B Tickle. Survey and critique of techniques for extracting rules from trained artificial neural networks. Knowledge-based systems, 8(6):373–389, 1995.
-  David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, and Antonio Torralba. Network dissection: Quantifying interpretability of deep visual representations. In Computer Vision and Pattern Recognition, 2017.
-  Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316, 2016.
-  Chaofan Chen, Oscar Li, Alina Barnett, Jonathan Su, and Cynthia Rudin. This looks like that: deep learning for interpretable image recognition. arXiv preprint arXiv:1806.10574, 2018.
-  Joan Claybrook and Shaun Kildare. Autonomous vehicles: No driver… no regulation? Science, 361(6397):36–37, 2018.
-  Jeffrey Dastin. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters, October 2018.
-  Finale Doshi-Velez and Been Kim. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608, 2017.
-  Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O’Brien, Stuart Schieber, James Waldo, David Weinberger, and Alexandra Wood. Accountability of AI under the law: The role of explanation. CoRR, abs/1711.01134, 2017.
Jerome H Friedman.
Greedy function approximation: a gradient boosting machine.Annals of statistics, pages 1189–1232, 2001.
-  Leilani H Gilpin, David Bau, Ben Z Yuan, Ayesha Bajwa, Michael Specter, and Lalana Kagal. Explaining explanations: An approach to evaluating interpretability of machine learning. arXiv preprint arXiv:1806.00069, 2018.
-  Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a" right to explanation". arXiv preprint arXiv:1606.08813, 2016.
-  David Gunning. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web, 2017.
-  Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(4):15, 2012.
-  Been Kim, Justin Gilmer, Fernanda Viegas, Ulfar Erlingsson, and Martin Wattenberg. Tcav: Relative concept importance testing with linear concept activation vectors. arXiv preprint arXiv:1711.11279, 2017.
-  Been Kim, Cynthia Rudin, and Julie A Shah. The bayesian case model: A generative approach for case-based reasoning and prototype classification. In Advances in Neural Information Processing Systems, pages 1952–1960, 2014.
-  Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. arXiv preprint arXiv:1703.04730, 2017.
-  Susan Landau. Control use of data to protect privacy. Science, 347(6221):504–506, 2015.
-  Timothy B. Lee. Report: Software bug led to death in Uber’s self-driving crash, May 2018.
-  Benjamin Letham, Cynthia Rudin, Tyler H McCormick, David Madigan, et al. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics, 9(3):1350–1371, 2015.
-  Gilles Louppe, Louis Wehenkel, Antonio Sutera, and Pierre Geurts. Understanding variable importances in forests of randomized trees. In Advances in neural information processing systems, pages 431–439, 2013.
-  Aarian Marshall. The Uber Crash Won’t Be the Last Shocking Self-Driving Death. Wired, March 2018.
-  Christoph Molnar. Interpretable Machine Learning.
-  Deirdre K Mulligan and Kenneth A Bamberger. Saving governance-by-design. 2018.
-  Deirdre K Mulligan, Colin Koopman, and Nick Doty. Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy. Phil. Trans. R. Soc. A, 374(2083):20160118, 2016.
-  Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach. Multimodal explanations: Justifying decisions and pointing to the evidence. CoRR, abs/1802.08129, 2018.
-  Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, pages 512–519. IEEE, 2014.
-  Cynthia Rudin. Algorithms for interpretable machine learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 1519–1519, New York, NY, USA, 2014. ACM.
-  Sara Sabour, Nicholas Frosst, and Geoffrey E Hinton. Dynamic routing between capsules. In Advances in Neural Information Processing Systems, pages 3859–3869, 2017.
-  Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. See https://arxiv. org/abs/1610.02391 v3, 7(8), 2016.
-  Raymond Sheh and Isaac Monteath. Defining explainable ai for requirements analysis. KI-Künstliche Intelligenz, pages 1–6, 2018.
-  Erik Štrumbelj and Igor Kononenko. Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems, 41(3):647–665, 2014.
-  Sarah Tan, Rich Caruana, Giles Hooker, and Yin Lou. Detecting bias in black-box models using transparent model distillation. arXiv preprint arXiv:1710.06169, 2017.
-  David C Vladeck. Machines without principals: liability rules and artificial intelligence. Wash. L. Rev., 89:117, 2014.
-  Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? In Advances in neural information processing systems, pages 3320–3328, 2014.
-  Zhenlong Yuan, Yongqiang Lu, Zhaoguo Wang, and Yibo Xue. Droid-sec: deep learning in android malware detection. In ACM SIGCOMM Computer Communication Review, volume 44, pages 371–372. ACM, 2014.
-  Xin Zhang, Armando Solar-Lezama, and Rishabh Singh. Interpreting neural network judgments via minimal, stable, and symbolic corrections. CoRR, abs/1802.07384, 2018.
-  Qingyuan Zhao and Trevor Hastie. Causal interpretations of black-box models. 2017.
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba.
Learning deep features for discriminative localization.In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on, pages 2921–2929. IEEE, 2016.
-  Jan Ruben Zilke, Eneldo Loza Mencía, and Frederik Janssen. Deepred–rule extraction from deep neural networks. In International Conference on Discovery Science, pages 457–473. Springer, 2016.