Trust in AI: Interpretability is not necessary or sufficient, while black-box interaction is necessary and sufficient

02/10/2022
by   Max W. Shen, et al.
0

The problem of human trust in artificial intelligence is one of the most fundamental problems in applied machine learning. Our processes for evaluating AI trustworthiness have substantial ramifications for ML's impact on science, health, and humanity, yet confusion surrounds foundational concepts. What does it mean to trust an AI, and how do humans assess AI trustworthiness? What are the mechanisms for building trustworthy AI? And what is the role of interpretable ML in trust? Here, we draw from statistical learning theory and sociological lenses on human-automation trust to motivate an AI-as-tool framework, which distinguishes human-AI trust from human-AI-human trust. Evaluating an AI's contractual trustworthiness involves predicting future model behavior using behavior certificates (BCs) that aggregate behavioral evidence from diverse sources including empirical out-of-distribution and out-of-task evaluation and theoretical proofs linking model architecture to behavior. We clarify the role of interpretability in trust with a ladder of model access. Interpretability (level 3) is not necessary or even sufficient for trust, while the ability to run a black-box model at-will (level 2) is necessary and sufficient. While interpretability can offer benefits for trust, it can also incur costs. We clarify ways interpretability can contribute to trust, while questioning the perceived centrality of interpretability to trust in popular discourse. How can we empower people with tools to evaluate trust? Instead of trying to understand how a model works, we argue for understanding how a model behaves. Instead of opening up black boxes, we should create more behavior certificates that are more correct, relevant, and understandable. We discuss how to build trusted and trustworthy AI responsibly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2020

Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI

Trust is a central component of the interaction between people and AI, i...
research
07/26/2022

A probabilistic theory of trust concerning artificial intelligence: can intelligent robots trust humans?

In this paper, I argue for a probabilistic theory of trust, and the plau...
research
11/16/2021

Will We Trust What We Don't Understand? Impact of Model Interpretability and Outcome Feedback on Trust in AI

Despite AI's superhuman performance in a variety of domains, humans are ...
research
10/22/2021

ProtoShotXAI: Using Prototypical Few-Shot Architecture for Explainable AI

Unexplainable black-box models create scenarios where anomalies cause de...
research
01/27/2022

To what extent should we trust AI models when they extrapolate?

Many applications affecting human lives rely on models that have come to...
research
12/06/2021

HIVE: Evaluating the Human Interpretability of Visual Explanations

As machine learning is increasingly applied to high-impact, high-risk do...
research
05/20/2022

Measuring algorithmic interpretability: A human-learning-based framework and the corresponding cognitive complexity score

Algorithmic interpretability is necessary to build trust, ensure fairnes...

Please sign up or login with your details

Forgot password? Click here to reset