Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

10/15/2021
by   Andreas Madsen, et al.
9

To explain NLP models, many methods inform which inputs tokens are important for a prediction. However, an open question is if these methods accurately reflect the model's logic, a property often called faithfulness. In this work, we adapt and improve a recently proposed faithfulness benchmark from computer vision called ROAR (RemOve And Retrain), by Hooker et al. (2019). We improve ROAR by recursively removing dataset redundancies, which otherwise interfere with ROAR. We adapt and apply ROAR, to popular NLP importance measures, namely attention, gradient, and integrated gradients. Additionally, we use mutual information as an additional baseline. Evaluation is done on a suite of classification tasks often used in the faithfulness of attention literature. Finally, we propose a scalar faithfulness metric, which makes it easy to compare results across papers. We find that, importance measures considered to be unfaithful for computer vision tasks perform favorably for NLP tasks, the faithfulness of an importance measure is task-dependent, and the computational overhead of integrated gradient is rarely justified.

READ FULL TEXT

page 6

page 11

research
04/22/2022

Locally Aggregated Feature Attribution on Natural Language Model Understanding

With the growing popularity of deep-learning models, model understanding...
research
04/15/2021

Unmasking the Mask – Evaluating Social Biases in Masked Language Models

Masked Language Models (MLMs) have shown superior performances in numero...
research
09/20/2023

RMT: Retentive Networks Meet Vision Transformers

Transformer first appears in the field of natural language processing an...
research
05/11/2023

KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment

Recent legislation of the "right to be forgotten" has led to the interes...
research
06/09/2019

Is Attention Interpretable?

Attention mechanisms have recently boosted performance on a range of NLP...
research
08/17/2022

Transformer Vs. MLP-Mixer Exponential Expressive Gap For NLP Problems

Vision-Transformers are widely used in various vision tasks. Meanwhile, ...
research
06/28/2018

Evaluating Feature Importance Estimates

Estimating the influence of a given feature to a model prediction is cha...

Please sign up or login with your details

Forgot password? Click here to reset