Attention is not not Explanation

by   Sarah Wiegreffe, et al.

Attention mechanisms play a central role in NLP systems, especially within recurrent neural network (RNN) models. Recently, there has been increasing interest in whether or not the intermediate representations offered by these modules may be used to explain the reasoning for a model's prediction, and consequently reach insights regarding the model's decision-making process. A recent paper claims that `Attention is not Explanation' (Jain and Wallace, 2019). We challenge many of the assumptions underlying this work, arguing that such a claim depends on one's definition of explanation, and that testing it needs to take into account all elements of the model, using a rigorous experimental design. We propose four alternative tests to determine when/whether attention can be used as explanation: a simple uniform-weights baseline; a variance calibration based on multiple random seed runs; a diagnostic framework using frozen weights from pretrained models; and an end-to-end adversarial attention training protocol. Each allows for meaningful interpretation of attention mechanisms in RNN models. We show that even when reliable adversarial distributions can be found, they don't perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.



There are no comments yet.


page 1

page 2

page 3

page 4


Rethinking Attention-Model Explainability through Faithfulness Violation Test

Attention mechanisms are dominating the explainability of deep models. T...

Staying True to Your Word: (How) Can Attention Become Explanation?

The attention mechanism has quickly become ubiquitous in NLP. In additio...

Attention Interpretability Across NLP Tasks

The attention layer in a neural network model provides insights into the...

A Concept-based Abstraction-Aggregation Deep Neural Network for Interpretable Document Classification

Using attention weights to identify information that is important for mo...

Towards Transparent and Explainable Attention Models

Recent studies on interpretability of attention distributions have led t...

Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for Post-Hoc Interpretability

Recent years have witnessed an increasing number of interpretation metho...

Attention based Writer Independent Handwriting Verification

The task of writer verification is to provide a likelihood score for whe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.