Easy to Decide, Hard to Agree: Reducing Disagreements Between Saliency Methods

11/15/2022
by   Josip Jukić, et al.
0

A popular approach to unveiling the black box of neural NLP models is to leverage saliency methods, which assign scalar importance scores to each input component. A common practice for evaluating whether an interpretability method is faithful and plausible has been to use evaluation-by-agreement – multiple methods agreeing on an explanation increases its credibility. However, recent work has found that even saliency methods have weak rank correlations and advocated for the use of alternative diagnostic methods. In our work, we demonstrate that rank correlation is not a good fit for evaluating agreement and argue that Pearson-r is a better suited alternative. We show that regularization techniques that increase faithfulness of attention explanations also increase agreement between saliency methods. Through connecting our findings to instance categories based on training dynamics we show that, surprisingly, easy-to-learn instances exhibit low agreement in saliency method explanations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

There is a recent surge of interest in using attention as explanation of...
research
11/15/2022

Evaluating the Faithfulness of Saliency-based Explanations for Deep Learning Models for Temporal Colour Constancy

The opacity of deep learning models constrains their debugging and impro...
research
10/13/2022

Constructing Natural Language Explanations via Saliency Map Verbalization

Saliency maps can explain a neural model's prediction by identifying imp...
research
06/21/2023

Evaluating the overall sensitivity of saliency-based explanation methods

We address the need to generate faithful explanations of "black box" Dee...
research
05/04/2023

Neighboring Words Affect Human Interpretation of Saliency Explanations

Word-level saliency explanations ("heat maps over words") are often used...
research
07/20/2018

Explaining Image Classifiers by Adaptive Dropout and Generative In-filling

Explanations of black-box classifiers often rely on saliency maps, which...
research
09/20/2023

Signature Activation: A Sparse Signal View for Holistic Saliency

The adoption of machine learning in healthcare calls for model transpare...

Please sign up or login with your details

Forgot password? Click here to reset