Manipulating and Measuring Model Interpretability

Despite a growing body of research focused on creating interpretable machine learning methods, there have been few empirical studies verifying whether interpretable methods achieve their intended effects on end users. We present a framework for assessing the effects of model interpretability on users via pre-registered experiments in which participants are shown functionally identical models that vary in factors thought to influence interpretability. Using this framework, we ran a sequence of large-scale randomized experiments, varying two putative drivers of interpretability: the number of features and the model transparency (clear or black-box). We measured how these factors impact trust in model predictions, the ability to simulate a model, and the ability to detect a model's mistakes. We found that participants who were shown a clear model with a small number of features were better able to simulate the model's predictions. However, we found no difference in multiple measures of trust and found that clear models did not improve the ability to correct mistakes. These findings suggest that interpretability research could benefit from more emphasis on empirically verifying that interpretable models achieve all their intended effects.

READ FULL TEXT

page 4

page 7

page 8

page 9

page 12

research
08/31/2022

Are we measuring trust correctly in explainability, interpretability, and transparency research?

This paper presents an argument for why we are not measuring trust suffi...
research
06/16/2016

Model-Agnostic Interpretability of Machine Learning

Understanding why machine learning models behave the way they do empower...
research
02/02/2021

Evaluating the Interpretability of Generative Models by Interactive Reconstruction

For machine learning models to be most useful in numerous sociotechnical...
research
12/06/2021

HIVE: Evaluating the Human Interpretability of Visual Explanations

As machine learning is increasingly applied to high-impact, high-risk do...
research
11/16/2021

Will We Trust What We Don't Understand? Impact of Model Interpretability and Outcome Feedback on Trust in AI

Despite AI's superhuman performance in a variety of domains, humans are ...
research
06/08/2020

A Semiparametric Approach to Interpretable Machine Learning

Black box models in machine learning have demonstrated excellent predict...
research
01/24/2021

Beyond Expertise and Roles: A Framework to Characterize the Stakeholders of Interpretable Machine Learning and their Needs

To ensure accountability and mitigate harm, it is critical that diverse ...

Please sign up or login with your details

Forgot password? Click here to reset