On the Validity of Self-Attention as Explanation in Transformer Models

08/12/2019
by   Gino Brunner, et al.
0

Explainability of deep learning systems is a vital requirement for many applications. However, it is still an unsolved problem. Recent self-attention based models for natural language processing, such as the Transformer or BERT, offer hope of greater explainability by providing attention maps that can be directly inspected. Nevertheless, by just looking at the attention maps one often overlooks that the attention is not over words but over hidden embeddings, which themselves can be mixed representations of multiple embeddings. We investigate to what extent the implicit assumption made in many recent papers - that hidden embeddings at all layers still correspond to the underlying words - is justified. We quantify how much embeddings are mixed based on a gradient based attribution method and find that already after the first layer less than 50 word, declining thereafter to a median contribution of 7.5 While throughout the layers the underlying word remains as the one contributing most to the embedding, we argue that attention visualizations are misleading and should be treated with care when explaining the underlying deep learning system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2020

Telling BERT's full story: from Local Attention to Global Aggregation

We take a deep look into the behavior of self-attention heads in the tra...
research
04/23/2022

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Transformer-based language models significantly advanced the state-of-th...
research
02/28/2023

Multi-Layer Attention-Based Explainability via Transformers for Tabular Data

We propose a graph-oriented attention-based explainability method for ta...
research
08/28/2023

Attention Visualizer Package: Revealing Word Importance for Deeper Insight into Encoder-Only Transformer Models

This report introduces the Attention Visualizer package, which is crafte...
research
04/26/2021

Attention vs non-attention for a Shapley-based explanation method

The field of explainable AI has recently seen an explosion in the number...
research
10/10/2020

Structured Self-Attention Weights Encode Semantics in Sentiment Analysis

Neural attention, especially the self-attention made popular by the Tran...
research
04/07/2022

Accelerating Attention through Gradient-Based Learned Runtime Pruning

Self-attention is a key enabler of state-of-art accuracy for various tra...

Please sign up or login with your details

Forgot password? Click here to reset