Artificial intelligence (AI) systems powered by deep neural networks (DNNs) are pervasive across society: they run in our pockets on our cell phones georgiev2017low, in cars to help avoid car accidents jain2015car, in banks to manage our investments chong2017deep and evaluate loans pham2017deep, in hospitals to help doctors diagnose disease symptoms nie2015disease, at law enforcement agencies to help recover evidence from videos and images to help law enforcement goswami2014mdlface, in the military of many countries lunden2016deep, and at insurance agencies to evaluate coverage suitability and costs for clients dong2016characterizing,sirignano2016deep. But when a person’s future is on the line, when a medical treatment is to be assigned, when a major financial decision must be made, when a military decision is to be reached, and when a risky choice having security ramifications is under consideration, it is understandable that we want AI to suggest or recommend a course of action with reasonable evidence, rather than to merely prescribe one. For the human ultimately responsible for the action taken, the use of present-day DNNs leaves an important question unanswered: how can one who will be held accountable for a decision trust a DNNs recommendation, and justify its use?
Achieving trust and finding justification in a DNN’s recommendation can hardly be achieved if the user does not have access to a satisfactory explanation for the process that led to its output. Consider for example a hypothetical scenario in which there exists a medical system running a DNN in the backend. Assume that the system makes life-altering predictions about whether or not a patient has a terminal illness. It is desirable if this system could also provide a rationale behind its predictions. More importantly, it is desirable if this system can give a rationale that both physicians and patients can understand and trust. Trust in a decision is built upon a rationale that is: (i) easily interpretable; (ii) relatable to the user; (iii) connects the decision with contextual information about the choice or to the user’s prior experiences; (iv) reflects the intermediate thinking of the user in reaching a decision. Given the qualitative nature of these characteristics, it may come as no surprise that there is great diversity in the definitions, approaches, and techniques used by researchers to provide a rationale for the decisions of a DNN. This diversity is further compounded by the fact that the form of a rationale often conforms to a researcher’s personal notion of what constitutes an “explanation”. For a newcomer to the field, whether they are seasoned AI researchers or students of an unrelated discipline that DNN decision-making stands to disturb, jumping into the field can be a daunting task.
This article offers a much-needed starting point for researchers and practitioners who are embarking into the field of explainable deep learning. This “field guide” is designed to help an uninitiated researcher understand:
Traits can be thought of as a simple set of qualitative target contributions that the explainable DNN field tries to achieve in the results. (Section 2).
Complementary research topics that are aligned with explainability. Topics that are complementary to explainability may involve the inspection of a DNN’s weights or activations, the development of mechanisms that mathematically explain how DNNs learn to generalize, or approaches to reduce a DNN’s sensitivity to particular input features. Such topics are indirectly associated with explainability in the sense that they investigate how a DNN learns or performs inference, even though the intention of the work is not directly to investigate explanations (Section 3).
A set of dimensions that characterize the space of work that constitutes foundational work in explainable deep learning, and a description of such methods. This space summarizes the core aspects of explainable DNN techniques that a majority of present work is inspired by or built from (Section LABEL:section:methods).
The considerations of a designer developing an explainable DNN system (Section LABEL:section:designing).
Future directions in explainability research (Section LABEL:section:future).
The aims of the field are established by understanding the traits desired in explainable DNNs. Complementary DNN topics are reviewed and the relationships between explainable DNNs and other related research areas are developed. Our taxonomy of explainable DNN techniques clarifies the technical ideas underpinning most modern explainable deep learning techniques. The discussion of fundamental explainable deep learning methods, emblematic of each framework dimension, provides further context for the modern work that builds on or takes inspiration from them. The field guide then turns to essential considerations that need to be made when building an explainable DNN system in practice (which could include multiple forms of explanation to meet requirements for both users and DNN technicians). Finally, the overview of our current limitations and seldom-looked at aspects of explainable deep learning suggests new research directions. This information captures what a newcomer needs to know to successfully navigate the current research literature on explainable deep learning and to identify new research problems to tackle.
There are a large number of existing reviews on the topic of model explainability. Most of them focus on explanations of general artificial intelligence methods arrieta2019explainable,carvalho2019machine,mueller2019explanation,tjoa2019survey,gilpin2018explaining,adadi2018peeking,miller2018explanation,guidotti2018survey,lipton2018mythos,liu2017towards,dovsilovic2018explainable,doshi2017towards, and some on deep learning ras2018explanation,montavon2018methods,zhang2018visual,samek2017explainable,erhan2010understanding. Given the existing reviews, the contributions of our article are as follows.
Our article is specifically targeting on deep learning explanation while existing reviews either focus on explanations of general artificial intelligence methods or are less as detailed and comprehensive as ours.
We provide a detailed field guide for researchers who are uninitiated in explainable deep learning, aiming to lower or even eliminate the bar for them to come into this field.
We propose a novel categorization scheme to systematically organize numerous existing methods on explainable deep learning, depicting the field in a clear and straightforward way.
A review of related topics that are closely related to the realm of explainable deep learning is elaborated. Such a review would help the uninitiated researchers thoroughly understand the connections between the related fields and explainable deep learning, and how jointly those fields help shape the future of deep learning towards transparency, robustness, and reliability.
2 The Traits of an Explanation
A central tenet in explainable machine learning is that the algorithm must emit information allowing a user to relate characteristics of input features with its output. It is thus worth noting that DNNs are not inherently “explainable”. The limited information captured in a DNN’s parameters associated with input features becomes entangled and compressed into a single value via a non-linear transform of a weighted sum of feature values. This compression occurs multiple times with different weight vectors depending on the number of activations in the first hidden layer. Subsequent layers then output non-linear transforms of weighted sums of these compressions, and so forth, until a decision is made based on the output of the DNN. Hence, it is exceedingly difficult to trace how particular stimulus properties drive this decision.
The unexplainable nature of DNNs is a significant impediment111There are common counter-arguments to call this limitation “significant”. Some argue that many successful applications of DNNs do not require explanations debate2017, and in these instances, enforcing constraints that provide explainability may hamper performance. Others claim that, because DNNs are inherently not explainable, an ascribed explanation is at best a plausible story about how the network processes an input that cannot be proven rudin2019stop.
to the wide-spread adoption of DNNs we are beginning to see in society. DNN-powered facial recognition systems, for example, are now associating people with locations and activities under wide-spread surveillance activities with opaque intent masi2018deep. People analytics and human resource platforms now tout the ability to predict employee performance and time to resignation, and to automatically scan the CV of job applicants zhao2018employee,qin2018enhancing. These examples foretell a future where DNN technology will make countless recommendations and decisions that more directly, and perhaps more significantly, impact people and their well-being in society.
The present art develops ways to promote traits that are associated with explainability. A trait represents a property of a DNN necessary for a user to evaluate its output lipton2018mythos. Traits, therefore, represent a particular objective or an evaluation criterion for explainable deep learning systems. We can say that a DNN promotes explainability if the system exhibits any trait that is justifiably related to explainability. This exhibition may be self-evident (e.g., in an NLP task, visualizations highlighting keywords or phrases that suggest a reasonable classification of a sentence), measurable (based on a trait-specific “error” metric), or evaluated through system usability studies. We discuss the four traits in Figure 2 that are the theme of much of the explainable deep learning literature.
Confidence. Confidence grows when the “rationale” of a DNN’s decision is congruent with the thought process of a user. Of course, a DNN’s output is based on a deterministic computation, rather than a logical rationale. But by associating the internal actions of a DNN with features of its input or with the environment it is operating in, and by observing decisions that match what a rational human decision-maker would decide, a user can begin to align a DNN’s processing with her own thought process to engender confidence.
For example, saliency maps of attention mechanisms on image park2018multimodal,hudson2018compositional or text vaswani2017attention,luong2015effective,letarte2018importance,he2018effective inputs reassure a user that the same semantically meaningful parts of the input she would focus on to make a classification decision are being used. Observing how the actions of a trained agent in a physical environment mimic the actions a human would give some confidence that its action choice calculus is aligned with a rational human. The saliency maps and the observation of the agents in these examples may constitute a suitable “form of explanation”.
Confidence must be developed by observing a DNN when its decisions are both correct and incorrect. Eschewing observations of incorrect decisions means a user will never be able to identify when she should not be confident, and hence not rely on a DNN. Users must be able to use their confidence to measure the operational boundaries of a DNN to be able to intuitively answer the question: When does this DNN work or not work?
. DNNs whose decision-making process need not be validated are trustworthy. Recent research jiang2018trust,baum2017,varshney2017safety,amodei2016concrete,pieters2011,lee2004trust explores the model trustworthiness problem, which studies whether or not a model prediction is safe to be adopted. Note that a prediction with high probability does not guarantee its trustworthiness, as shown in recent adversarial studies goodfellow2014explaining,nguyen2015deep,moosavi2016deepfool,yuan2019adversarial. Trust in a DNN is best developed in two ways: (i)Satisfactory testing. Under ideal conditions, the network’s performance on test data should well approximate their performance in practice. The test accuracy of a model can thus be thought of as a direct measure of trust: a model with a perfect performance during the testing phase may be fully trusted to make decisions; lower performance degrades trust proportionally. (ii) Experience. A user does not need to inspect or validate the actions of a DNN as long as the network’s input/output behavior matches expectations. For example, a DNN’s ability to predict handwritten digits from MNIST is beyond question lecun1995comparison,lecun1998gradient, and may thus be regarded as a trustworthy system to sort postal mail by location code. A system that consistently performs poorly in practice may even be “un-trusted” indicating that the DNN should not be used.
Trust is a difficult trait to evaluate. Most deep learning studies include some evaluation component over test data, but it is seldom the case that the evaluation is ideal. Without careful sampling procedures, test data can be biased towards a particular class or have feature distributions that do not match the general case tommasi2017deeper. It may also be the case that a model can perform poorly in practice over time as characteristics of the data evolve or drift. Therefore, the best way to evaluate trust is with system observations (spanning both output and internal processing) over time. Explainability research allowing users to evaluate these observations (through an interpretation of activations during a DNN’s forward pass, for example) is one avenue for enhancing trust.
Safety. DNNs whose decisions (in)directly lead to an event impacting human life, wealth, or societal policy should be safe. The definition of safety is multi-faceted. A safe DNN should: (i) consistently operate as expected; (ii) given cues from its input, guard against choices that can negatively impact the user or society; (iii) exhibit high reliability under both standard and exceptional operating conditions; (iv) provide feedback to a user about how operating conditions influence its decisions. The first aspect of safety aligns this trait with trust since trust in a system is a prerequisite to consider it safe to use. The second and third aspects imply that safe systems possess mechanisms that augment its decision-making process to steer away from decisions with negative impact, or consider its operating environment as part of its decision-making process. The fourth aspect gives necessary feedback to the user to assess safety. The feedback may include an evaluation of its environment, the decision reached, and how the environment and the input data influence the decision made. This allows the user to verify the rationality of the decision making process with respect to the environment that the system is operating in.
Ethics. A DNN behaves ethically if its decisions and decision-making process does not violate a code of moral principles defined by the user. The right way to evaluate if a DNN is acting ethically is a topic of debate. For example, different users assume their own unique code of ethics, ethical decisions in one culture may be unethical in another, and there may be instances where no possible decision is consistent with a set of moral principles. Thus, rather than making DNNs inherently ethical, this trait can be expressed by some notion of an “ethics code” that the system’s decisions are formed under. This allows users to individually assess if the reasoning of a DNN is compatible with the moral principles it should operate over. The field of ethical decision making in AI is growing as a field in and of itself (see Section LABEL:subsection:fairness_and_bias).