Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI

05/25/2022
by   Suzanna Sia, et al.
0

Evaluating an explanation's faithfulness is desired for many reasons such as trust, interpretability and diagnosing the sources of model's errors. In this work, which focuses on the NLI task, we introduce the methodology of Faithfulness-through-Counterfactuals, which first generates a counterfactual hypothesis based on the logical predicates expressed in the explanation, and then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic (i.e. if the new formula is logically satisfiable). In contrast to existing approaches, this does not require any explanations for training a separate verification model. We first validate the efficacy of automatic counterfactual hypothesis generation, leveraging on the few-shot priming paradigm. Next, we show that our proposed metric distinguishes between human-model agreement and disagreement on new counterfactual input. In addition, we conduct a sensitivity analysis to validate that our metric is sensitive to unfaithful explanations.

READ FULL TEXT

page 4

page 14

page 15

research
06/06/2022

Improving Model Understanding and Trust with Counterfactual Explanations of Model Confidence

In this paper, we show that counterfactual explanations of confidence sc...
research
06/20/2022

A Symbolic Approach for Counterfactual Explanations

In this paper titled A Symbolic Approach for Counterfactual Explanations...
research
05/26/2023

Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors

In the wake of responsible AI, interpretability methods, which attempt t...
research
05/11/2022

"If it didn't happen, why would I change my decision?": How Judges Respond to Counterfactual Explanations for the Public Safety Assessment

Many researchers and policymakers have expressed excitement about how al...
research
09/15/2021

CounterNet: End-to-End Training of Counterfactual Aware Predictions

This work presents CounterNet, a novel end-to-end learning framework whi...
research
04/02/2023

The Effect of Counterfactuals on Reading Chest X-rays

This study evaluates the effect of counterfactual explanations on the in...
research
01/12/2023

Counterfactual Explanations for Concepts in ℰℒℋ

Knowledge bases are widely used for information management on the web, e...

Please sign up or login with your details

Forgot password? Click here to reset