Counterfactuals of Counterfactuals: a back-translation-inspired approach to analyse counterfactual editors

05/26/2023
by   Giorgos Filandrianos, et al.
0

In the wake of responsible AI, interpretability methods, which attempt to provide an explanation for the predictions of neural models have seen rapid progress. In this work, we are concerned with explanations that are applicable to natural language processing (NLP) models and tasks, and we focus specifically on the analysis of counterfactual, contrastive explanations. We note that while there have been several explainers proposed to produce counterfactual explanations, their behaviour can vary significantly and the lack of a universal ground truth for the counterfactual edits imposes an insuperable barrier on their evaluation. We propose a new back translation-inspired evaluation methodology that utilises earlier outputs of the explainer as ground truth proxies to investigate the consistency of explainers. We show that by iteratively feeding the counterfactual to the explainer we can obtain valuable insights into the behaviour of both the predictor and the explainer models, and infer patterns that would be otherwise obscured. Using this methodology, we conduct a thorough analysis and propose a novel metric to evaluate the consistency of counterfactual generation approaches with different characteristics across available performance indicators.

READ FULL TEXT
research
12/16/2022

Counterfactual Explanations for Misclassified Images: How Human and Machine Explanations Differ

Counterfactual explanations have emerged as a popular solution for the e...
research
07/22/2019

The Dangers of Post-hoc Interpretability: Unjustified Counterfactual Explanations

Post-hoc interpretability approaches have been proven to be powerful too...
research
10/28/2019

A Game Theoretic Approach to Class-wise Selective Rationalization

Selection of input features such as relevant pieces of text has become a...
research
05/25/2022

Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI

Evaluating an explanation's faithfulness is desired for many reasons suc...
research
01/25/2023

Counterfactual Editing for Search Result Explanation

Recently substantial improvements in neural retrieval methods also bring...
research
01/21/2023

Counterfactual Explanation and Instance-Generation using Cycle-Consistent Generative Adversarial Networks

The image-based diagnosis is now a vital aspect of modern automation ass...
research
01/12/2023

Counterfactual Explanations for Concepts in ℰℒℋ

Knowledge bases are widely used for information management on the web, e...

Please sign up or login with your details

Forgot password? Click here to reset