Human Evaluation of Spoken vs. Visual Explanations for Open-Domain QA

12/30/2020
by   Ana Valeria Gonzalez, et al.
0

While research on explaining predictions of open-domain QA systems (ODQA) to users is gaining momentum, most works have failed to evaluate the extent to which explanations improve user trust. While few works evaluate explanations using user studies, they employ settings that may deviate from the end-user's usage in-the-wild: ODQA is most ubiquitous in voice-assistants, yet current research only evaluates explanations using a visual display, and may erroneously extrapolate conclusions about the most performant explanations to other modalities. To alleviate these issues, we conduct user studies that measure whether explanations help users correctly decide when to accept or reject an ODQA system's answer. Unlike prior work, we control for explanation modality, e.g., whether they are communicated to users through a spoken or visual interface, and contrast effectiveness across modalities. Our results show that explanations derived from retrieved evidence passages can outperform strong baselines (calibrated confidence) across modalities but the best explanation strategy in fact changes with the modality. We show common failure cases of current explanations, emphasize end-to-end evaluation of explanations, and caution against evaluating them in proxy modalities that are different from deployment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2018

Metrics for Explainable AI: Challenges and Prospects

The question addressed in this paper is: If we present to a user an AI s...
research
02/15/2023

A Case Study on Designing Evaluations of ML Explanations with Simulated User Studies

When conducting user studies to ascertain the usefulness of model explan...
research
06/22/2021

On the Diversity and Limits of Human Explanations

A growing effort in NLP aims to build datasets of human explanations. Ho...
research
11/19/2019

PRINCE: Provider-side Interpretability with Counterfactual Explanations in Recommender Systems

Interpretable explanations for recommender systems and other machine lea...
research
02/19/2018

Hierarchical Expertise-Level Modeling for User Specific Robot-Behavior Explanations

There is a growing interest within the AI research community to develop ...
research
04/14/2021

To Trust or Not to Trust a Regressor: Estimating and Explaining Trustworthiness of Regression Predictions

In hybrid human-AI systems, users need to decide whether or not to trust...
research
07/01/2020

In-Distribution Interpretability for Challenging Modalities

It is widely recognized that the predictions of deep neural networks are...

Please sign up or login with your details

Forgot password? Click here to reset