Attention is not not Explanation

08/13/2019
by   Sarah Wiegreffe, et al.
0

Attention mechanisms play a central role in NLP systems, especially within recurrent neural network (RNN) models. Recently, there has been increasing interest in whether or not the intermediate representations offered by these modules may be used to explain the reasoning for a model's prediction, and consequently reach insights regarding the model's decision-making process. A recent paper claims that `Attention is not Explanation' (Jain and Wallace, 2019). We challenge many of the assumptions underlying this work, arguing that such a claim depends on one's definition of explanation, and that testing it needs to take into account all elements of the model, using a rigorous experimental design. We propose four alternative tests to determine when/whether attention can be used as explanation: a simple uniform-weights baseline; a variance calibration based on multiple random seed runs; a diagnostic framework using frozen weights from pretrained models; and an end-to-end adversarial attention training protocol. Each allows for meaningful interpretation of attention mechanisms in RNN models. We show that even when reliable adversarial distributions can be found, they don't perform well on the simple diagnostic, indicating that prior work does not disprove the usefulness of attention mechanisms for explainability.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/28/2022

Rethinking Attention-Model Explainability through Faithfulness Violation Test

Attention mechanisms are dominating the explainability of deep models. T...
05/19/2020

Staying True to Your Word: (How) Can Attention Become Explanation?

The attention mechanism has quickly become ubiquitous in NLP. In additio...
09/24/2019

Attention Interpretability Across NLP Tasks

The attention layer in a neural network model provides insights into the...
04/24/2020

A Concept-based Abstraction-Aggregation Deep Neural Network for Interpretable Document Classification

Using attention weights to identify information that is important for mo...
04/29/2020

Towards Transparent and Explainable Attention Models

Recent studies on interpretability of attention distributions have led t...
09/16/2020

Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for Post-Hoc Interpretability

Recent years have witnessed an increasing number of interpretation metho...
09/07/2020

Attention based Writer Independent Handwriting Verification

The task of writer verification is to provide a likelihood score for whe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.