Who's Thinking? A Push for Human-Centered Evaluation of LLMs using the XAI Playbook

03/10/2023
by   Teresa Datta, et al.
0

Deployed artificial intelligence (AI) often impacts humans, and there is no one-size-fits-all metric to evaluate these tools. Human-centered evaluation of AI-based systems combines quantitative and qualitative analysis and human input. It has been explored to some depth in the explainable AI (XAI) and human-computer interaction (HCI) communities. Gaps remain, but the basic understanding that humans interact with AI and accompanying explanations, and that humans' needs – complete with their cognitive biases and quirks – should be held front and center, is accepted by the community. In this paper, we draw parallels between the relatively mature field of XAI and the rapidly evolving research boom around large language models (LLMs). Accepted evaluative metrics for LLMs are not human-centered. We argue that many of the same paths tread by the XAI community over the past decade will be retread when discussing LLMs. Specifically, we argue that humans' tendencies – again, complete with their cognitive biases and quirks – should rest front and center when evaluating deployed LLMs. We outline three developed focus areas of human-centered evaluation of XAI: mental models, use case utility, and cognitive engagement, and we highlight the importance of exploring each of these concepts for LLMs. Our goal is to jumpstart human-centered LLM evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2022

BIASeD: Bringing Irrationality into Automated System Design

Human perception, memory and decision-making are impacted by tens of cog...
research
01/11/2020

How to Answer Why – Evaluating the Explanations of AI Through Mental Model Analysis

To achieve optimal human-system integration in the context of user-AI in...
research
11/12/2021

Enabling human-centered AI: A new junction and shared journey

AI has unique characteristics compared to non-AI systems. The AI/CS comm...
research
12/29/2021

On some Foundational Aspects of Human-Centered Artificial Intelligence

The burgeoning of AI has prompted recommendations that AI techniques sho...
research
06/28/2022

Evaluating Understanding on Conceptual Abstraction Benchmarks

A long-held objective in AI is to build systems that understand concepts...
research
11/30/2020

Continuous Subject-in-the-Loop Integration: Centering AI on Marginalized Communities

Despite its utopian promises as a disruptive equalizer, AI - like most t...
research
07/14/2020

Our Evaluation Metric Needs an Update to Encourage Generalization

Models that surpass human performance on several popular benchmarks disp...

Please sign up or login with your details

Forgot password? Click here to reset