Identifying the Source of Vulnerability in Explanation Discrepancy: A Case Study in Neural Text Classification

12/10/2022
by   Ruixuan Tang, et al.
0

Some recent works observed the instability of post-hoc explanations when input side perturbations are applied to the model. This raises the interest and concern in the stability of post-hoc explanations. However, the remaining question is: is the instability caused by the neural network model or the post-hoc explanation method? This work explores the potential source that leads to unstable post-hoc explanations. To separate the influence from the model, we propose a simple output probability perturbation method. Compared to prior input side perturbation methods, the output probability perturbation method can circumvent the neural model's potential effect on the explanations and allow the analysis on the explanation method. We evaluate the proposed method with three widely-used post-hoc explanation methods (LIME (Ribeiro et al., 2016), Kernel Shapley (Lundberg and Lee, 2017a), and Sample Shapley (Strumbelj and Kononenko, 2010)). The results demonstrate that the post-hoc methods are stable, barely producing discrepant explanations under output probability perturbations. The observation suggests that neural network models may be the primary source of fragile explanations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2021

Towards Benchmarking the Utility of Explanations for Model Debugging

Post-hoc explanation methods are an important class of approaches that h...
research
07/23/2023

Right for the Wrong Reason: Can Interpretable ML Techniques Detect Spurious Correlations?

While deep neural network models offer unmatched classification performa...
research
01/14/2022

When less is more: Simplifying inputs aids neural network understanding

How do neural network image classifiers respond to simpler and simpler i...
research
02/28/2022

An Empirical Study on Explanations in Out-of-Domain Settings

Recent work in Natural Language Processing has focused on developing app...
research
01/25/2022

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

Existing and planned legislation stipulates various obligations to provi...
research
08/10/2021

Post-hoc Interpretability for Neural NLP: A Survey

Natural Language Processing (NLP) models have become increasingly more c...
research
08/14/2023

Can we Agree? On the Rashōmon Effect and the Reliability of Post-Hoc Explainable AI

The Rashōmon effect poses challenges for deriving reliable knowledge fro...

Please sign up or login with your details

Forgot password? Click here to reset