GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

05/06/2022
by   Ali Modarressi, et al.
0

There has been a growing interest in interpreting the underlying dynamics of Transformers. While self-attention patterns were initially deemed as the primary option, recent studies have shown that integrating other components can yield more accurate explanations. This paper introduces a novel token attribution analysis method that incorporates all the components in the encoder block and aggregates this throughout layers. Through extensive quantitative and qualitative experiments, we demonstrate that our method can produce faithful and meaningful global token attributions. Our experiments reveal that incorporating almost every encoder component results in increasingly more accurate analysis in both local (single layer) and global (the whole model) settings. Our global attribution analysis significantly outperforms previous methods on various tasks regarding correlation with gradient-based saliency scores. Our code is freely available at https://github.com/mohsenfayyaz/GlobEnc.

READ FULL TEXT

page 1

page 12

page 13

page 14

research
01/30/2023

Quantifying Context Mixing in Transformers

Self-attention weights and their transformed variants have been the main...
research
06/05/2023

DecompX: Explaining Transformers Decisions by Propagating Token Decomposition

An emerging solution for explaining Transformer-based models is to use v...
research
12/10/2022

Position Embedding Needs an Independent Layer Normalization

The Position Embedding (PE) is critical for Vision Transformers (VTs) du...
research
05/17/2023

Incorporating Attribution Importance for Improving Faithfulness Metrics

Feature attribution methods (FAs) are popular approaches for providing i...
research
03/08/2022

Measuring the Mixing of Contextual Information in the Transformer

The Transformer architecture aggregates input information through the se...
research
11/08/2022

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of ℓ_2 Regularization

While ℓ_2 regularization is widely used in training gradient boosted tre...
research
11/06/2022

ViT-CX: Causal Explanation of Vision Transformers

Despite the popularity of Vision Transformers (ViTs) and eXplainable AI ...

Please sign up or login with your details

Forgot password? Click here to reset