Who is this Explanation for? Human Intelligence and Knowledge Graphs for eXplainable AI

05/27/2020 ∙ by Irene Celino, et al. ∙ 0

eXplainable AI focuses on generating explanations for the output of an AI algorithm to a user, usually a decision-maker. Such user needs to interpret the AI system in order to decide whether to trust the machine outcome. When addressing this challenge, therefore, proper attention should be given to produce explanations that are interpretable by the target community of users. In this chapter, we claim for the need to better investigate what constitutes a human explanation, i.e. a justification of the machine behaviour that is interpretable and actionable by the human decision makers. In particular, we focus on the contributions that Human Intelligence can bring to eXplainable AI, especially in conjunction with the exploitation of Knowledge Graphs. Indeed, we call for a better interplay between Knowledge Representation and Reasoning, Social Sciences, Human Computation and Human-Machine Cooperation research – as already explored in other AI branches – in order to support the goal of eXplainable AI with the adoption of a Human-in-the-Loop approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The recent renaissance of Machine Learning and Artificial Intelligence approaches brought a new wave of interest in such methods and technologies. Autonomous agents and automatic systems are now available and more affordable than before, but, if we relied only on popular news and communication, we would tend to think that they completely got rid of human intervention both in their setup and in their operation. Any practitioner, however, knows very well that human contributions are indispensable in order to set up, train, optimise and operate such systems.

Referring to the AI systems that more strongly rely on data and in particular to predictive Machine Learning, human knowledge is still required in all phases to answer relevant questions that are not necessarily targeted to the AI experts:

  • before creating a model: during training set creation (“what data can I use to build a model?”)

  • at model building time: during model validation (“is my model correct?”, “is my model good enough?”) and during model refinement (“what additional training data/features would improve my model performance?”)

  • using the model in production: to ensure algorithmic transparency (“should I trust the way my model gave such a prediction?”) and to provide explainability (“why did my model give such an outcome/prediction?”)

In this chapter, we focus on the role that Human Intelligence and (human-generated) Knowledge Graphs play to answer the above questions. We also claim that, with special reference to explainability, humans are only partially considered in eXplainable AI research, while they should, because the required explanations should be useful for human comprehension.

The remainder of the chapter is structured as follows: related work is illustrated in Section 2, and Section 3 clarifies what we mean by explanation and why humans are needed in their generation; opportunities for (human) eXplainable AI coming from the employment of Human Intelligence and Knowledge Graphs are outlined in Section 4, and Section 5 presents some conclusions and traces some possible future work.

2 Related Work

In the context of Artificial Intelligence and Machine Learning, several research trends investigate the role and interplay between humans and machines.

A new emerging process of scientific inquiry is shown in [shih2018beyond]: different people beyond scientists are now involved in such a process, because laymen participate both in the creation/collection of information (via user-generated content) and in the coding/labelling/validation phases (e.g. through Crowdsourcing or Citizen Science); the authors call for a new data analytics paradigm with user involvement, and demonstrate experimental results to show the effect of interface design on how users transform information.

Indeed, the power of the “crowd” is often leveraged to create large-scale training sets for Machine Learning, by adopting Crowdsourcing [howe2008crowdsourcing], Human Computation [law2011human] and Citizen Science [irwin2002citizen] approaches. Moreover, knowledge in human cognitive processes may assist the design and implementation of Machine Learning, as claimed in [zheng2018effective]; however, the current popularity of black-box models hinders an effective human intervention because those approaches negatively impact on trustworthiness, interpretability and the discovery of hidden rules.

User trust is indeed an important indicator because it correlates with system accuracy: humans are able to dynamically adjust their reliance based on a system perceived accuracy and they even show acceptance thresholds [yu2018trust]: this implies the need to correctly design an AI system to sustain the desired level of user trust. The validation phase of Machine Learning algorithms also benefits from the integration of user-centred evaluation: the authors of [cambo2018user] advocate adopting user-centred design (iterative) approaches for Machine Learning, in model optimisation, selection and validation.

Different families of methods explicitly aim to improve learned models based on human knowledge. Active Learning 

[settles2009active]

is based on the idea that a Machine Learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to “choose” the data from which it learns, by asking queries to an “oracle” which usually is a human annotator. Transfer Learning 

[pan2009transfer] emerged to fulfill the need to build real world applications in which it is expensive or impossible to re-collect the training data required and rebuild the models; in such cases knowledge transfer is attempted by adapting a model already trained on some domain (with the help of human annotators) to a different domain.

Human-Machine Cooperation is at the heart of Interactive Machine Learning [amershi2012interactive], in which a human operator and a machine collaborate to achieve a task; while coupling algorithm-centred analysis with human-centred evaluation seems to yield better results than a fully automated or fully manual approach, research is still needed to explore to what extent this mix can provide benefits [boukhelifa2018evaluation]: participatory design with end-users could help incorporating human expertise in algorithms and models; visualisation techniques could facilitate user feedback; creativity, lateral thinking and exploration can also support, if suitable tools and objective and subjective metrics are developed.

In general, in order to improve and optimise the interaction between humans and machines, a perspective shift should be adopted, for example by walking away from a purely technical optimisation and embracing a designer mindset, like the one proposed in [noessel2017designing], in which the author invites to stop seeing technologies as a collection of tools and gadgets and instead start seeing them as an evolutionary flow around human problems, whose parts ultimately integrate to create a new category of things named agentive technologies or “AI that works for people”.

3 What is an explanation for humans

The rationale behind eXplainable AI research is that Artificial Intelligence systems should not only display an intelligent behaviour, but they also should be able to explain such behaviour. The naturally raising question is what an explanation is and how to generate it. In this section, we attempt at illustrating the characteristics of explanations and we justify the need for “Human Intelligence” and “Human-in-the-Loop” approaches also in relation to eXplainable AI.

3.1 A working definition of explanation

Let us consider the simple example of email categorisation between spam and non-spam. Here the task is binary classification (i.e., the output of a Machine Learning classifier is the labeling of each mail as spam or non-spam).

An explanation consists in a set of hints to understand the relationship between the characteristics of an individual (e.g. an email) and the model prediction on that individual (e.g. this email is spam). The explanation is used by a human decision-maker, who should decide whether to trust the system (e.g. accept or reject the prediction of the spam classifier) [ribeiro2016whytrust].

User trust can happen at different levels: on the individual prediction, when the user requires an explanation about a specific instance (e.g. why this mail is spam) or on an entire model, when the user needs to decide whether to trust the system altogether. In the latter case, an explanation could require selecting a representative sample of individuals (e.g. a set of spam/non-spam emails) and explaining each individual in the sample.

The main characteristics that an explanation should display (again according to [ribeiro2016whytrust]) are fidelity, model-independence and interpretability. Local fidelity or local faithfulness means that a prediction should be valid in the vicinity of the individual; global fidelity would of course be desirable, but it could be challenging for complex models. The explanation should also be model-agnostic, in that it should be independent on the specific type of AI model. Finally, interpretability is the qualitative understanding of the relationship between the input variables and the response (e.g. the relation between the words contained in an email and the email categorisation as spam/non-spam).

Interpretability is the key aspect for an explanation to be accepted by a user; in our example of an email classifier, an interpretable explanation could rely on a list of words (e.g. the system thinks this email is spam because it contains the following words) rather that be based on opaque clues (e.g. word embeddings) which are not easily understandable by a human. The level of interpretability of an explanation of course depends on the audience, because humans use their previous knowledge about the application domain to interpret an explanation and accept/reject a prediction based on their understanding.

3.2 Explanation from a human point of view

The latest point on interpretability clarifies that proper attention should be given to the different kinds of explanations that could be generated, in particular by distinguishing “machine” explanations from “human” explanations.

Indeed, most XAI research has been focusing on generating machine explanations

, i.e. justifications of what/how the machine “thinks”. In other words, machine explanations try to explain the scientific theory behind a model, to allow for phenomena comprehension. In case of interpretable models (like linear regression or decision trees), the machine explanation consists in making explicit the mathematical/logical relation between inputs and outputs (e.g. tree model of decisions). In case of black-box models, especially for deep learning and other complex approaches, the machine explanation may be based on “compressed” models or other approximation techniques that still use an explicit representation of the relation between inputs and outputs.

Instead, human explanations focus on what a human user wants to know in order to interpret a model and make subsequent decisions. The user may be uninterested in the internal functioning of an algorithm, as she may even be unable to understand a potentially complex mathematical formulation of the function that transforms the input parameters in the output prediction. On the contrary, the user is interested in getting useful clues on why a specific output is given, in order to evaluate if such output is “reliable” from a human understanding point of view.

As a consequence, in order to be useful, a human explanation needs to display some specific characteristics [mittelstadt2019explaining]. An explanation should be selective: it should not provide all possible reasons, but convey only the “relevant” causes; indeed, people usually do not expect an explanation to consist of a complete cause of an event, also to let the explanation itself being reduced to a cognitively manageable size; moreover, an explanation should not contain useless information, like presuppositions or beliefs that the user already holds. Humans psychologically prefer contrastive explanations, in that they are used to reason according to counter-factual causality (i.e. people do not ask why an event A happened, but rather why an event A happened instead of some other event B), especially in case of an anomaly or an abnormal event. Another characteristic of human explanations is that they are usually social, involving the interaction between (multiple) explainers and explainees; also with respect to eXplainable AI, explanations should be seen as an interactive process, including interaction and dialogue with a mix of human and machine participants.

From all the above considerations, it is apparent that eXplainable AI research should go well beyond automatic methods to generate explanations; it is of utmost importance to keep the Human-in-the-Loop. There are at least two main reasons to advocate for the active involvement of people in eXplainable AI [miller2017beware]: on the one hand, if explanation formulation is delegated to “computer scientists”, the risk is that such explanations are too close to the model and too far from human understanding, especially that of domain/business users who need to interpret such information; on the other hand, there is a large body of knowledge about explanations from the social sciences (philosophy, psychology, cognitive science), which could bring tangible benefits to eXplainable AI research in terms of getting to a “good” explanation from a human point of view [miller2019socialsciences].

4 Human Intelligence and Knowledge Graphs to support eXplainable AI

The Semantic Web has always relied on humans, since most of its tasks are knowledge-intensive and context-specific and, as such, they require user engagement for their solution (e.g., conceptual modelling, multi-language resource labelling, content annotation with ontologies, concept/entity similarity recognition). With the rise of Knowledge Graphs and their popularity, new opportunities have emerged to exploit them for AI in general and specifically for eXplainable AI [lecue2019kg4xai].

Without the claim of being exhaustive, in the following we illustrate a set of approaches that can bring Human Intelligence and Knowledge Graphs to the benefit of eXplainable AI, with specific reference to Machine Learning. We distinguish between two main types of opportunities, those related to the exploitation of (human-generated) Knowledge Graphs and those that capitalise on the direct involvement of people; we depict them in Figure 1 along two axes, representing whether Human Intelligence is employed in data/knowledge representation or for explanations.

Figure 1: Graphical representation of Human Intelligence approaches

4.1 (Human) Knowledge Graphs for XAI

The first set of approaches exploits Knowledge Graphs to support explanation generation. We specifically focus on the role of human-generated information to directly and indirectly support XAI.

4.1.1 Dataset metadata

Structured data represents an invaluable input for any Machine Learning approach. Consequently, linked data and Knowledge Graphs represent as such a rich and priceless contribution. An important role can even be played by simple metadata: descriptive metadata about datasets, in the form of DCAT [dcat-v2] and related vocabularies, can be exploited to improve data sourcing. The information about where some data comes from can also be re-used for explanations: users can better judge the reliability or the meaningfulness of a machine output if they are given also the detail about the original sources.

For example, the opportunities to facilitate dataset reuse in the development of chatbots are illustrated by the BotDCAT-AP vocabulary [cappello2017botdcat], an extension of the Data Catalogue (DCAT) Application Profile. BotDCAT-AP enables the description of intents (i.e., the actions users want to accomplish by interacting with a chatbot) and entities (i.e., individual information units associated to an intent) supported by a dataset and the method to access it; as such, it enables and fosters reusability of datasets (including Knowledge Graphs) across chatbot systems. It could also be exploited further to support the generation of explanations for the chatbot “replies” in terms of recognised intents/entities and used datasets.

4.1.2 People-specific semantics

Different users may have different interests or skills and, as a consequence, they may need different explanations. User-generated data often implicitly contains hints on what people care about; this can be an opportunity to exploit when providing explanations on systems trained on such data.

For example, spatial data analytics of OpenStreetMap manual tagging showed to be beneficial for geo-ontology engineering, by surfacing latent semantic differences in concepts by different communities [recalegari2016supporting]: the same “concept” of spatial object (e.g., a pub) may have slightly diverging meanings in different places (e.g., a place to dine in UK, a bar to have a drink in Italy). This implicit semantics, when extracted and made explicit, can be exploited also for explanation generation, because it can contribute to convey the right “semantics” to the right community.

4.1.3 Provenance of user-generated data

Providing better data for training in turn leads to better models, as well as to more interpretable explanations. When data is user-generated, quality assurance is an important step, for example to aggregate inputs from multiple contributors (cf. “truth inference” in Crowdsourcing [zheng2017truth]). Provenance metadata about human contributions often contain important clues that can be exploited both for quality improvement and for generating explanations.

For example, in the case of the Human Computation-powered volunteered geographic information (VGI) illustrated in [celino2013human], the involvement of a crowd of volunteers, potentially untrained or non-experts, implies that VGI can be of varying quality; tracing VGI provenance enables the recording of the collection activity: the information about who gathered what, where and when is then employed to compute and judge the VGI quality. The same provenance information can be offered to users of systems trained on such user-generated data, to explain where some prediction comes from.

4.1.4 Knowledge graphs as explanation content

Structured knowledge and Knowledge Graphs can be used as basis for explanations, because they may already contain the rationale behind the relationship between inputs and outputs of a system. Whenever a predictive system is based on a knowledge base, the relevant part of it that motivates a system output can be directly used as explanation.

For example, graph traversal information is used to explain the suggestions of a knowledge-based recommender system in [dellaglio2010anatomy]: the logical path connecting a user (e.g., John loves hard rock music) and a recommended item (e.g., X is a Web-radio broadcasting rock music) provides a digestible account of the reasons behind the recommendation (e.g., John is recommended to listen to X, because John loves hard rock music, hard rock is a kind of rock music, X broadcasts rock music). The chain of relevant connected resources/properties (i.e., the set of triples composing a path between the user and the recommended item) already constitutes a human explanation for the recommendation.

4.2 Human Intelligence for XAI

The second set of approaches directly focuses on the active involvement of people to the benefit of XAI.

4.2.1 User engagement for data quality

As claimed in Section 3, in order to provide human explanations, we should turn to social sciences, which may help in the involvement and engagement of people also during the phase of explanation generation for AI systems. Engaging humans is a challenge by itself, therefore eXplainable AI could reuse the research results in relation to designing and exploiting behaviours, personal motivations and incentive mechanisms.

For example, the evaluation and improvement of data quality can be achieved through an analysis of contributions: user behaviour influences data quality and should be taken into account, to evaluate the reliability of user-generated information, to better design data collection systems and to generate explanations. As demonstrated in [recalegari2018interplay], the presence of tangible rewards, leveraging extrinsic motivation, affects quantity and quality of collected data; moreover, an analysis of accuracy and participation of contributors highlights different engagement profiles, which should be taken into account when aggregating user-generated data and should be capitalised for explainability.

4.2.2 User interaction and user experience

Lessons learned and best practices from user experience design can also inform human-powered explanation generation, because they can help in designing suitable tools and data value chains that involve and engage people to bring benefit to AI in general and eXplainable AI specifically.

Indeed, a carefully designed user interaction with digital tools proves to be key in raising attention and improving data quality, as shown in [celino2020submitting] with respect to survey data collection: an improvement on user experience, making questionnaire compilation more enjoyable, leads also to higher-quality information, because it reduces the satisficing effect and increases response quality. Therefore, involving users for the generation or validation of explanations, for example by adopting a social and interactive pattern guided by a design thinking approach, can maximise user attention and ease user experience, thus making sure that the result is a good explanation from a human point of view.

4.2.3 Human and machine confidence

Predictive Machine Learning modelling aims at building a trustworthy system able to provide prediction on unknown cases; to evaluate model confidence, different metrics are usually employed to give quantitative estimates of a prediction reliability. Reporting confidence metrics to support prediction explanation is a means to increase user trust, but again those quantitative hints should be interpretable from a human point of view.

Human intervention can be also employed to support a model evaluation and, consequently, a model explanation through confidence metrics. Indeed, it can happen that what is “difficult” to predict for an algorithm (i.e. predictions with low confidence metrics) is also difficult for humans to judge; the case of questionable image classification is illustrated in [recalegari2018human], where the correspondence exists between low-confidence machine classifications and user disagreement. The correlation between human and machine predictions and their respective confidence/reliability can be exploited to understand the reasons behind a model and can therefore improve both the modelling phase (by incorporating additional human knowledge in training) and the generation of explanations (which can be closer to human understanding).

4.2.4 Explanation as a means to improve user involvement

The most challenging aspect of Human-Machine Cooperation is the effective involvement of people in the various phases of modelling. While users are already employed in data collection and model validation, further opportunities lie in a more interwoven interaction between human steps and automatic steps. Therefore, explanations are not only an objective as such, but they can be an instrument to further involve and motivate human participants in the AI system life-cycle.

For example, in order to identify and reduce bias in knowledge representation and modelling, the involved users should not only be exposed to potential biased information, but should also be given an explanation for such an identified bias, to understand the reasons behind a questionable piece of information or prediction. A Human-in-the-Loop approach to identify and resolve implicit bias in Knowledge Graphs is illustrated in [summerschool]: users are involved not only to accept/reject an identified bias, but they are also engaged as decision-makers to evaluate if further actions should be taken to solve such bias.

5 Conclusions

eXplainable AI aims at generating explanations to justify the output of an algorithm to a user, usually a decision-maker. Those explanations need to be interpretable by the intended target users and, therefore, cannot be restricted to the “scientific modelling” (i.e., the explanation of the scientific/mathematical law or theory behind an artificial model), but should be focused on addressing the needs of the decision makers, which exploit such explanations and decide whether to trust an AI system.

Therefore, a better understanding of Human Intelligence is needed to make sure that the generated explanations are “good enough” to be used in practice: a certain help can come from social sciences, but even within the ICT community, we identify several opportunities. On the one hand, Knowledge Representation and Reasoning (KRR) research has always been addressing the open issue of human knowledge formalisation; in this context, therefore, eXplainable AI can leverage all the experience related to the involvement of human annotators and crowdsourced knowledge bases and Knowledge Graphs (e.g. DBpedia [dbpedia] and Wikidata [wikidata]): indeed, the same Human Intelligence that supports KRR tasks can be similarly exploited for eXplainable AI.

On the other hand, Human Computer Interaction (HCI) research has been focusing on improving and optimizing user experience with digital tools; in this context, eXplainable AI can leverage the approaches and methods to support the “interaction” between a human user and a digital explanation, improving interpretability and promote trust. AI system should be designed to allow and facilitate the exchange with the relevant user communities: while people are already heavily involved in data collection, their engagement in other steps of the AI life-cycle is still to be fully explored, especially with respect to explainability.

The big challenge is to define flexible and complex human-computer cooperative systems, able to guide in the preparation, building and production of data processing pipelines involving Artificial Intelligence technologies. Human Intelligence and Knowledge Graphs should become first-order citizens of such data value chains, not only to improve the performance of such artificial systems, but also – and foremost – to assure that AI outcomes are relevant and usable by human decision makers.

Acknowledgments

The presented research was partially supported by the ACTION project (grant agreement number 824603), co-funded by the European Commission under the Horizon 2020 Framework Programme. We would like to thank Gloria Re Calegari and Ilaria Tiddi for their feedback and revision on this chapter.

References