Proxy Tasks and Subjective Measures Can Be Misleading in Evaluating Explainable AI Systems

01/22/2020
by   Zana Buçinca, et al.
10

Explainable artificially intelligent (XAI) systems form part of sociotechnical systems, e.g., human+AI teams tasked with making decisions. Yet, current XAI systems are rarely evaluated by measuring the performance of human+AI teams on actual decision-making tasks. We conducted two online experiments and one in-person think-aloud study to evaluate two currently common techniques for evaluating XAI systems: (1) using proxy, artificial tasks such as how well humans predict the AI's decision from the given explanations, and (2) using subjective measures of trust and preference as predictors of actual performance. The results of our experiments demonstrate that evaluations with proxy tasks did not predict the results of the evaluations with the actual decision-making tasks. Further, the subjective measures on evaluations with actual decision-making tasks did not predict the objective performance on those same tasks. Our results suggest that by employing misleading evaluation methods, our field may be inadvertently slowing its progress toward developing human+AI teams that can reliably perform better than humans or AIs alone.

READ FULL TEXT

page 4

page 5

page 8

research
05/12/2023

In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making

The current literature on AI-advised decision making – involving explain...
research
05/03/2022

On the Effect of Information Asymmetry in Human-AI Teams

Over the last years, the rising capabilities of artificial intelligence ...
research
05/10/2022

A Meta-Analysis of the Utility of Explainable Artificial Intelligence in Human-AI Decision-Making

Research in artificial intelligence (AI)-assisted decision-making is exp...
research
04/30/2019

Theoretical, Measured and Subjective Responsibility in Aided Decision Making

AI and advanced automation are involved in almost all aspects of our lif...
research
06/16/2023

Evaluating Superhuman Models with Consistency Checks

If machine learning models were to achieve superhuman abilities at vario...

Please sign up or login with your details

Forgot password? Click here to reset