Faithfulness Tests for Natural Language Explanations

05/29/2023
by   Pepa Atanasova, et al.
0

Explanations of neural models aim to reveal a model's decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model's inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2018

Generating Counterfactual Explanations with Natural Language

Natural language explanations of deep neural network decisions provide a...
research
10/04/2020

Explaining Deep Neural Networks

Deep neural networks are becoming more and more popular due to their rev...
research
10/12/2021

Investigating the Effect of Natural Language Explanations on Out-of-Distribution Generalization in Few-shot NLI

Although neural models have shown strong performance in datasets such as...
research
05/11/2021

Counterfactual Explanations for Neural Recommenders

Understanding why specific items are recommended to users can significan...
research
03/10/2017

Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations

Neural networks are among the most accurate supervised learning methods ...
research
10/07/2019

Make Up Your Mind! Adversarial Generation of Inconsistent Natural Language Explanations

To increase trust in artificial intelligence systems, a growing amount o...
research
04/27/2022

Counterfactual Explanations for Natural Language Interfaces

A key challenge facing natural language interfaces is enabling users to ...

Please sign up or login with your details

Forgot password? Click here to reset