Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations

12/17/2021
by   Siddhant Arora, et al.
0

In attempts to "explain" predictions of machine learning models, researchers have proposed hundreds of techniques for attributing predictions to features that are deemed important. While these attributions are often claimed to hold the potential to improve human "understanding" of the models, surprisingly little work explicitly evaluates progress towards this aspiration. In this paper, we conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. They are challenged both to simulate the model on fresh reviews, and to edit reviews with the goal of lowering the probability of the originally predicted class. Successful manipulations would lead to an adversarial example. During the training (but not the test) phase, input spans are highlighted to communicate salience. Through our evaluation, we observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control. For the BERT-based classifier, popular local explanations do not improve their ability to reduce the model confidence over the no-explanation case. Remarkably, when the explanation for the BERT model is given by the (global) attributions of a linear model trained to imitate the BERT model, people can effectively manipulate the model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

Explanation-Based Tuning of Opaque Machine Learners with Application to Paper Recommendation

Research in human-centered AI has shown the benefits of machine-learning...
research
01/30/2018

The Intriguing Properties of Model Explanations

Linear approximations to the decision boundary of a complex model have b...
research
07/19/2021

On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples

Local explanation methods such as LIME have become popular in MIR as too...
research
04/22/2022

Learning to Scaffold: Optimizing Model Explanations for Teaching

Modern machine learning models are opaque, and as a result there is a bu...
research
09/18/2021

BERT-Beta: A Proactive Probabilistic Approach to Text Moderation

Text moderation for user generated content, which helps to promote healt...
research
07/10/2023

Explanation Needs in App Reviews: Taxonomy and Automated Detection

Explainability, i.e. the ability of a system to explain its behavior to ...
research
01/11/2021

Explain and Predict, and then Predict again

A desirable property of learning systems is to be both effective and int...

Please sign up or login with your details

Forgot password? Click here to reset